[jira] [Updated] (HIVE-17754) InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs

2017-10-09 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17754:

Status: Patch Available  (was: Open)

> InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs
> -
>
> Key: HIVE-17754
> URL: https://issues.apache.org/jira/browse/HIVE-17754
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17754.1.patch
>
>
> HIVE-9845 dealt with reducing the size of HCat split-info, to improve 
> job-launch times for Pig/HCat jobs.
> For large Pig queries that scan a large number of Hive partitions, it was 
> found that the Pig {{UDFContext}} stored full-fat HCat {{InputJobInfo}} 
> objects, thus blowing out the Pig Tez AM. Since this information is already 
> stored in the {{HCatSplit}}, the serialization of {{InputJobInfo}} can be 
> spared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17754) InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs

2017-10-09 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17754:

Attachment: HIVE-17754.1.patch

This fix depends on HIVE-11548. The attached patch contains both the fix for 
HIVE-11548 and HIVE-17754. Submitting for tests...

> InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs
> -
>
> Key: HIVE-17754
> URL: https://issues.apache.org/jira/browse/HIVE-17754
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17754.1.patch
>
>
> HIVE-9845 dealt with reducing the size of HCat split-info, to improve 
> job-launch times for Pig/HCat jobs.
> For large Pig queries that scan a large number of Hive partitions, it was 
> found that the Pig {{UDFContext}} stored full-fat HCat {{InputJobInfo}} 
> objects, thus blowing out the Pig Tez AM. Since this information is already 
> stored in the {{HCatSplit}}, the serialization of {{InputJobInfo}} can be 
> spared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17754) InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198244#comment-16198244
 ] 

Mithun Radhakrishnan edited comment on HIVE-17754 at 10/10/17 4:59 PM:
---

This fix depends on HIVE-11548. The attached patch contains both the fix for 
HIVE-11548 and the one for HIVE-17754. Submitting for tests...


was (Author: mithun):
This fix depends on HIVE-11548. The attached patch contains both the fix for 
HIVE-11548 and HIVE-17754. Submitting for tests...

> InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs
> -
>
> Key: HIVE-17754
> URL: https://issues.apache.org/jira/browse/HIVE-17754
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17754.1.patch
>
>
> HIVE-9845 dealt with reducing the size of HCat split-info, to improve 
> job-launch times for Pig/HCat jobs.
> For large Pig queries that scan a large number of Hive partitions, it was 
> found that the Pig {{UDFContext}} stored full-fat HCat {{InputJobInfo}} 
> objects, thus blowing out the Pig Tez AM. Since this information is already 
> stored in the {{HCatSplit}}, the serialization of {{InputJobInfo}} can be 
> spared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.7.patch

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, 
> HIVE-11548.6-branch-2.2.patch, HIVE-11548.6-branch-2.patch, 
> HIVE-11548.6.patch, HIVE-11548.7.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.7-branch-2.patch

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, 
> HIVE-11548.6-branch-2.2.patch, HIVE-11548.6-branch-2.patch, 
> HIVE-11548.6.patch, HIVE-11548.7-branch-2.patch, HIVE-11548.7.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.7-branch-2.2.patch

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch, 
> HIVE-11548.6-branch-2.2.patch, HIVE-11548.6-branch-2.patch, 
> HIVE-11548.6.patch, HIVE-11548.7-branch-2.2.patch, 
> HIVE-11548.7-branch-2.patch, HIVE-11548.7.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17754) InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17754:

Attachment: HIVE-17754.2.patch

Addressing {{Test*HCatLoader}} test failures.

> InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs
> -
>
> Key: HIVE-17754
> URL: https://issues.apache.org/jira/browse/HIVE-17754
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17754.1.patch, HIVE-17754.2.patch
>
>
> HIVE-9845 dealt with reducing the size of HCat split-info, to improve 
> job-launch times for Pig/HCat jobs.
> For large Pig queries that scan a large number of Hive partitions, it was 
> found that the Pig {{UDFContext}} stored full-fat HCat {{InputJobInfo}} 
> objects, thus blowing out the Pig Tez AM. Since this information is already 
> stored in the {{HCatSplit}}, the serialization of {{InputJobInfo}} can be 
> spared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17763) HCatLoader should fetch delegation tokens for partitions on remote HDFS

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17763:
---


> HCatLoader should fetch delegation tokens for partitions on remote HDFS
> ---
>
> Key: HIVE-17763
> URL: https://issues.apache.org/jira/browse/HIVE-17763
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> The Hive metastore might store partition-info for data stored on a remote 
> HDFS (i.e. different from what's defined by {{fs.default.name}}. 
> {{HCatLoader}} should automatically fetch delegation-tokens for all remote 
> HDFSes that participate in an HCat-based query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17763) HCatLoader should fetch delegation tokens for partitions on remote HDFS

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17763:

Description: The Hive metastore might store partition-info for data stored 
on a remote HDFS (i.e. different from what's defined by {{fs.default.name}}. 
{{HCatLoader}} should automatically fetch delegation-tokens for all remote 
HDFSes that participate in an HCat-based query. (Note to self: YHIVE-661)  
(was: The Hive metastore might store partition-info for data stored on a remote 
HDFS (i.e. different from what's defined by {{fs.default.name}}. {{HCatLoader}} 
should automatically fetch delegation-tokens for all remote HDFSes that 
participate in an HCat-based query.)

> HCatLoader should fetch delegation tokens for partitions on remote HDFS
> ---
>
> Key: HIVE-17763
> URL: https://issues.apache.org/jira/browse/HIVE-17763
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> The Hive metastore might store partition-info for data stored on a remote 
> HDFS (i.e. different from what's defined by {{fs.default.name}}. 
> {{HCatLoader}} should automatically fetch delegation-tokens for all remote 
> HDFSes that participate in an HCat-based query. (Note to self: YHIVE-661)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17754) InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17754:

Attachment: HIVE-17754.2-branch-2.patch

> InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs
> -
>
> Key: HIVE-17754
> URL: https://issues.apache.org/jira/browse/HIVE-17754
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17754.1.patch, HIVE-17754.2-branch-2.patch, 
> HIVE-17754.2.patch
>
>
> HIVE-9845 dealt with reducing the size of HCat split-info, to improve 
> job-launch times for Pig/HCat jobs.
> For large Pig queries that scan a large number of Hive partitions, it was 
> found that the Pig {{UDFContext}} stored full-fat HCat {{InputJobInfo}} 
> objects, thus blowing out the Pig Tez AM. Since this information is already 
> stored in the {{HCatSplit}}, the serialization of {{InputJobInfo}} can be 
> spared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17763) HCatLoader should fetch delegation tokens for partitions on remote HDFS

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17763:

Attachment: HIVE-17763.1.patch

This fix depends on HIVE-11548 and HIVE-17754. This patch has the fix for all 3 
JIRAs. Submitting to run tests.

> HCatLoader should fetch delegation tokens for partitions on remote HDFS
> ---
>
> Key: HIVE-17763
> URL: https://issues.apache.org/jira/browse/HIVE-17763
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17763.1.patch
>
>
> The Hive metastore might store partition-info for data stored on a remote 
> HDFS (i.e. different from what's defined by {{fs.default.name}}. 
> {{HCatLoader}} should automatically fetch delegation-tokens for all remote 
> HDFSes that participate in an HCat-based query. (Note to self: YHIVE-661)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17763) HCatLoader should fetch delegation tokens for partitions on remote HDFS

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17763:

Status: Patch Available  (was: Open)

> HCatLoader should fetch delegation tokens for partitions on remote HDFS
> ---
>
> Key: HIVE-17763
> URL: https://issues.apache.org/jira/browse/HIVE-17763
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17763.1.patch
>
>
> The Hive metastore might store partition-info for data stored on a remote 
> HDFS (i.e. different from what's defined by {{fs.default.name}}. 
> {{HCatLoader}} should automatically fetch delegation-tokens for all remote 
> HDFSes that participate in an HCat-based query. (Note to self: YHIVE-661)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17754) HCat InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17754:

Summary: HCat InputJobInfo in Pig UDFContext is heavyweight, and causes 
OOMs in Tez AMs  (was: InputJobInfo in Pig UDFContext is heavyweight, and 
causes OOMs in Tez AMs)

> HCat InputJobInfo in Pig UDFContext is heavyweight, and causes OOMs in Tez AMs
> --
>
> Key: HIVE-17754
> URL: https://issues.apache.org/jira/browse/HIVE-17754
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17754.1.patch, HIVE-17754.2-branch-2.patch, 
> HIVE-17754.2.patch
>
>
> HIVE-9845 dealt with reducing the size of HCat split-info, to improve 
> job-launch times for Pig/HCat jobs.
> For large Pig queries that scan a large number of Hive partitions, it was 
> found that the Pig {{UDFContext}} stored full-fat HCat {{InputJobInfo}} 
> objects, thus blowing out the Pig Tez AM. Since this information is already 
> stored in the {{HCatSplit}}, the serialization of {{InputJobInfo}} can be 
> spared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17770) HCatalog documentation for Pig type-mapping incorrect for "bag" types

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17770:
---


> HCatalog documentation for Pig type-mapping incorrect for "bag" types
> -
>
> Key: HIVE-17770
> URL: https://issues.apache.org/jira/browse/HIVE-17770
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation, HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Minor
>
> Raising on behalf of [~cdrome], to track a change in documentation.
> The [HCatalog LoadStore type-mapping 
> documentation|https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-ComplexTypes]
>  mentions the following:
> ||Hive Type||Pig Type||
> |map (key type should be string)|map|
> |*List<_any type_>*|bag|
> |struct|tuple|
> We should change {{List<_any type_>}} to {{ARRAY<_any type_>}}, as per the 
> description of Hive's complex types, in [the language 
> manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17770) HCatalog documentation for Pig type-mapping incorrect for "bag" types

2017-10-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan resolved HIVE-17770.
-
Resolution: Fixed

Corrected the cwiki-documentation, and left a note in the change-log, pointing 
to this JIRA. Thanks for pointing this error out, [~cdrome]!

> HCatalog documentation for Pig type-mapping incorrect for "bag" types
> -
>
> Key: HIVE-17770
> URL: https://issues.apache.org/jira/browse/HIVE-17770
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation, HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Minor
>
> Raising on behalf of [~cdrome], to track a change in documentation.
> The [HCatalog LoadStore type-mapping 
> documentation|https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-ComplexTypes]
>  mentions the following:
> ||Hive Type||Pig Type||
> |map (key type should be string)|map|
> |*List<_any type_>*|bag|
> |struct|tuple|
> We should change {{List<_any type_>}} to {{ARRAY<_any type_>}}, as per the 
> description of Hive's complex types, in [the language 
> manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201143#comment-16201143
 ] 

Mithun Radhakrishnan commented on HIVE-17669:
-

Thanks for the review, [~prasanth_j]. I've picked this into {{branch-2.2}} as 
well. Cheers!

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17669.3.patch, HIVE-17669.4.patch, 
> HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201155#comment-16201155
 ] 

Mithun Radhakrishnan commented on HIVE-17669:
-

Updated documentation, 
[here|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties].

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17669.3.patch, HIVE-17669.4.patch, 
> HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17781:
---


> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201253#comment-16201253
 ] 

Mithun Radhakrishnan commented on HIVE-17669:
-

Thanks for pointing this out, [~prasanth_j]. I can commit this to 
{{branch-2.3}} as well.

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17669.3.patch, HIVE-17669.4.patch, 
> HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17781:

Attachment: HIVE-17781.1.patch

Submitting for tests.

> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17781.1.patch
>
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17781:

Status: Patch Available  (was: Open)

> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17781.1.patch
>
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17784) Make Tez AM's Queue headroom calculation and nParallel tasks configurable.

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17784:
---


> Make Tez AM's Queue headroom calculation and nParallel tasks configurable.
> --
>
> Key: HIVE-17784
> URL: https://issues.apache.org/jira/browse/HIVE-17784
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Tez
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Here's a couple of customizations we made at Yahoo with Hive Tez AMs:
> # When calculating splits, {{HiveSplitGenerator}} takes the entire queue's 
> capacity as available, and generates splits accordingly. While this greedy 
> algorithm might be acceptable for exclusive queues, on a shared queue, greedy 
> queries will hold other queries up. The algorithm that calculates the queue's 
> headroom should be pluggable. The greedy version can be the default.
> # {{TEZ_AM_VERTEX_MAX_TASK_CONCURRENCY}} and the AM's heap-size can be tuned 
> separately from the AM's container size. We found that users who attempt to 
> increase vertex concurrency tend to forget to bump AM memory/container sizes. 
> It would be handier if those values were derived from the container size.
> I'm combining these into a single patch, for easier review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17784) Make Tez AM's Queue headroom calculation and nParallel tasks configurable.

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17784:

Status: Patch Available  (was: Open)

> Make Tez AM's Queue headroom calculation and nParallel tasks configurable.
> --
>
> Key: HIVE-17784
> URL: https://issues.apache.org/jira/browse/HIVE-17784
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Tez
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17784.1.patch
>
>
> Here's a couple of customizations we made at Yahoo with Hive Tez AMs:
> # When calculating splits, {{HiveSplitGenerator}} takes the entire queue's 
> capacity as available, and generates splits accordingly. While this greedy 
> algorithm might be acceptable for exclusive queues, on a shared queue, greedy 
> queries will hold other queries up. The algorithm that calculates the queue's 
> headroom should be pluggable. The greedy version can be the default.
> # {{TEZ_AM_VERTEX_MAX_TASK_CONCURRENCY}} and the AM's heap-size can be tuned 
> separately from the AM's container size. We found that users who attempt to 
> increase vertex concurrency tend to forget to bump AM memory/container sizes. 
> It would be handier if those values were derived from the container size.
> I'm combining these into a single patch, for easier review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17784) Make Tez AM's Queue headroom calculation and nParallel tasks configurable.

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17784:

Attachment: HIVE-17784.1.patch

Tentative first cut.

> Make Tez AM's Queue headroom calculation and nParallel tasks configurable.
> --
>
> Key: HIVE-17784
> URL: https://issues.apache.org/jira/browse/HIVE-17784
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Tez
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17784.1.patch
>
>
> Here's a couple of customizations we made at Yahoo with Hive Tez AMs:
> # When calculating splits, {{HiveSplitGenerator}} takes the entire queue's 
> capacity as available, and generates splits accordingly. While this greedy 
> algorithm might be acceptable for exclusive queues, on a shared queue, greedy 
> queries will hold other queries up. The algorithm that calculates the queue's 
> headroom should be pluggable. The greedy version can be the default.
> # {{TEZ_AM_VERTEX_MAX_TASK_CONCURRENCY}} and the AM's heap-size can be tuned 
> separately from the AM's container size. We found that users who attempt to 
> increase vertex concurrency tend to forget to bump AM memory/container sizes. 
> It would be handier if those values were derived from the container size.
> I'm combining these into a single patch, for easier review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17784) Make Tez AM's Queue headroom calculation and nParallel tasks configurable.

2017-10-11 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17784:

Description: 
Here's a couple of customizations we made at Yahoo with Hive Tez AMs:
# When calculating splits, {{HiveSplitGenerator}} takes the entire queue's 
capacity as available, and generates splits accordingly. While this greedy 
algorithm might be acceptable for exclusive queues, on a shared queue, greedy 
queries will hold other queries up. The algorithm that calculates the queue's 
headroom should be pluggable. The greedy version can be the default.
# {{TEZ_AM_VERTEX_MAX_TASK_CONCURRENCY}} and the AM's heap-size can be tuned 
separately from the AM's container size. We found that users who attempt to 
increase vertex concurrency tend to forget to bump AM memory/container sizes. 
It would be handier if those values were derived from the container size.

I'm combining these into a single patch, for easier review.

(Note to self: YHIVE-840)

  was:
Here's a couple of customizations we made at Yahoo with Hive Tez AMs:
# When calculating splits, {{HiveSplitGenerator}} takes the entire queue's 
capacity as available, and generates splits accordingly. While this greedy 
algorithm might be acceptable for exclusive queues, on a shared queue, greedy 
queries will hold other queries up. The algorithm that calculates the queue's 
headroom should be pluggable. The greedy version can be the default.
# {{TEZ_AM_VERTEX_MAX_TASK_CONCURRENCY}} and the AM's heap-size can be tuned 
separately from the AM's container size. We found that users who attempt to 
increase vertex concurrency tend to forget to bump AM memory/container sizes. 
It would be handier if those values were derived from the container size.

I'm combining these into a single patch, for easier review.


> Make Tez AM's Queue headroom calculation and nParallel tasks configurable.
> --
>
> Key: HIVE-17784
> URL: https://issues.apache.org/jira/browse/HIVE-17784
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Tez
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17784.1.patch
>
>
> Here's a couple of customizations we made at Yahoo with Hive Tez AMs:
> # When calculating splits, {{HiveSplitGenerator}} takes the entire queue's 
> capacity as available, and generates splits accordingly. While this greedy 
> algorithm might be acceptable for exclusive queues, on a shared queue, greedy 
> queries will hold other queries up. The algorithm that calculates the queue's 
> headroom should be pluggable. The greedy version can be the default.
> # {{TEZ_AM_VERTEX_MAX_TASK_CONCURRENCY}} and the AM's heap-size can be tuned 
> separately from the AM's container size. We found that users who attempt to 
> increase vertex concurrency tend to forget to bump AM memory/container sizes. 
> It would be handier if those values were derived from the container size.
> I'm combining these into a single patch, for easier review.
> (Note to self: YHIVE-840)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17781:

Description: 
Here's one that [~cdrome] and [~thiruvel] worked on:

We found that certain Hadoop Map/Reduce settings that are set in site config 
files do not take effect in Hive jobs, because the Tez site configs do not 
contain the same settings.

In Yahoo's case, the problem was that, at the time, there was no mapping 
between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
{{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
significant capacity on production clusters were being used up doing nothing, 
while waiting for slow tasks to complete. This would have been avoided, were 
the mappings in place.

Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to Tez 
settings. Hive should use this to ensure that the mappings are in sync.

(Note to self: YHIVE-883)

  was:
Here's one that [~cdrome] and [~thiruvel] worked on:

We found that certain Hadoop Map/Reduce settings that are set in site config 
files do not take effect in Hive jobs, because the Tez site configs do not 
contain the same settings.

In Yahoo's case, the problem was that, at the time, there was no mapping 
between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
{{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
significant capacity on production clusters were being used up doing nothing, 
while waiting for slow tasks to complete. This would have been avoided, were 
the mappings in place.

Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to Tez 
settings. Hive should use this to ensure that the mappings are in sync.


> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17781.1.patch
>
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.
> (Note to self: YHIVE-883)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reopened HIVE-17669:
-

Thank you for the heads-up, [~vihangk1]. 

I had a non-lambda version of this patch queued up for {{branch-2}}. I'll 
revert this commit, and check that version in, very shortly.

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17669.3.patch, HIVE-17669.4.patch, 
> HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202325#comment-16202325
 ] 

Mithun Radhakrishnan edited comment on HIVE-17669 at 10/12/17 5:43 PM:
---

Thank you for the heads-up, [~vihangk1]. 

I had a non-lambda version of this patch queued up for {{branch-2}}. I'll 
revert this commit, and check that version in, very shortly.

(P.S. Apologies for the inconvenience. I could have caught this yesterday. This 
wasn't the version of the patch that I had in mind for {{branch-2}}.)


was (Author: mithun):
Thank you for the heads-up, [~vihangk1]. 

I had a non-lambda version of this patch queued up for {{branch-2}}. I'll 
revert this commit, and check that version in, very shortly.

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17669.3.patch, HIVE-17669.4.patch, 
> HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202347#comment-16202347
 ] 

Mithun Radhakrishnan commented on HIVE-17669:
-

[~vihangk1]: I've reverted the patch on {{branch-2}}, and checked in a 
non-lambda version. Things should be sorted now.

[~prasanth_j]: I'm running some checks on {{branch-2.3}} before I check this 
in. I'll close this JIRA again after it's checked in.

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17669.3.patch, HIVE-17669.4.patch, 
> HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan resolved HIVE-17669.
-
   Resolution: Fixed
Fix Version/s: 2.2.1

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17669.3.patch, HIVE-17669.4.patch, 
> HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17791:
---


> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Attachment: HIVE-17791.1.patch

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Status: Open  (was: Patch Available)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Attachment: (was: HIVE-17791.1.patch)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Affects Version/s: 2.4.0
   2.2.0
   Status: Patch Available  (was: Open)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Status: Patch Available  (was: Open)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Attachment: HIVE-17791.1-branch-2.patch

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Attachment: HIVE-17791.1-branch-2.2.patch

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.2.patch, 
> HIVE-17791.1-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Description: 
For [~cdrome]:

CLI creates two levels of staging directories but calls setPermissions on the 
top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.

The top-level directory, 
{{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
 is created the first time {{Context.getExternalTmpPath}} is called.

The child directory, 
{{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
 is created when {{TezTask.execute}} is called at line 164:

{code:java}
DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
{code}

This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:

{code:java}
3770   private static void createTmpDirs(Configuration conf,
3771   List> ops) throws IOException {
3772 
3773 while (!ops.isEmpty()) {
3774   Operator op = ops.remove(0);
3775 
3776   if (op instanceof FileSinkOperator) {
3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
3778 Path tempDir = fdesc.getDirName();
3779 
3780 if (tempDir != null) {
3781   Path tempPath = Utilities.toTempPath(tempDir);
3782   FileSystem fs = tempPath.getFileSystem(conf);
3783   fs.mkdirs(tempPath); // <-- HERE!
3784 }
3785   }
3786 
3787   if (op.getChildOperators() != null) {
3788 ops.addAll(op.getChildOperators());
3789   }
3790 }
3791   }
{code}

It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll rebase 
this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to wait till 
the issues around {{StorageBasedAuthProvider}}, directory permissions, etc. are 
sorted out.

(Note to self: YHIVE-857)

  was:
For [~cdrome]:

CLI creates two levels of staging directories but calls setPermissions on the 
top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.

The top-level directory, 
{{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
 is created the first time {{Context.getExternalTmpPath}} is called.

The child directory, 
{{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
 is created when {{TezTask.execute}} is called at line 164:

{code:java}
DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
{code}

This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:

{code:java}
3770   private static void createTmpDirs(Configuration conf,
3771   List> ops) throws IOException {
3772 
3773 while (!ops.isEmpty()) {
3774   Operator op = ops.remove(0);
3775 
3776   if (op instanceof FileSinkOperator) {
3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
3778 Path tempDir = fdesc.getDirName();
3779 
3780 if (tempDir != null) {
3781   Path tempPath = Utilities.toTempPath(tempDir);
3782   FileSystem fs = tempPath.getFileSystem(conf);
3783   fs.mkdirs(tempPath); // <-- HERE!
3784 }
3785   }
3786 
3787   if (op.getChildOperators() != null) {
3788 ops.addAll(op.getChildOperators());
3789   }
3790 }
3791   }
{code}

It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll rebase 
this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to wait till 
the issues around {{StorageBasedAuthProvider}}, directory permissions, etc. are 
sorted out.


> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.2.patch, 
> HIVE-17791.1-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr

[jira] [Commented] (HIVE-16874) qurey fail when try to read file from remote hdfs

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202624#comment-16202624
 ] 

Mithun Radhakrishnan commented on HIVE-16874:
-

[~thejas] is right. I think this was resolved as part of HIVE-14380. 
[~daniel.yj.zh...@gmail.com], could you please confirm if that solution doesn't 
solve this problem?

> qurey fail when try to read file from remote hdfs
> -
>
> Key: HIVE-16874
> URL: https://issues.apache.org/jira/browse/HIVE-16874
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 1.2.1
>Reporter: Yunjian Zhang
> Attachments: HIVE-6.ext.patch
>
>
> as per an extend issue on HIVE-6, table join and insert on remote hdfs 
> storage will fail with same issue.
> batch base on 
> https://issues.apache.org/jira/secure/attachment/12820392/HIVE-6.1.patch, 
> attached patch will fix the issues mentioned here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11116) Can not select data from table which points to remote hdfs location

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202630#comment-16202630
 ] 

Mithun Radhakrishnan commented on HIVE-6:
-

[~sushanth] is right. This should have been resolved as part of HIVE-14380. If 
yes, can this JIRA be closed?

> Can not select data from table which points to remote hdfs location
> ---
>
> Key: HIVE-6
> URL: https://issues.apache.org/jira/browse/HIVE-6
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Affects Versions: 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Alexander Pivovarov
>Assignee: David Karoly
> Attachments: HIVE-6.1.patch
>
>
> I tried to create new table which points to remote hdfs location and select 
> data from it.
> It works for hive-0.14 and hive-1.0  but it does not work starting from 
> hive-1.1
> to reproduce the issue
> 1. create folder on remote hdfs
> {code}
> hadoop fs -mkdir -p hdfs://remote-nn/tmp/et1
> {code}
> 2. create table 
> {code}
> CREATE TABLE et1 (
>   a string
> ) stored as textfile
> LOCATION 'hdfs://remote-nn/tmp/et1';
> {code}
> 3. run select
> {code}
> select * from et1 limit 10;
> {code}
> 4. Should get the following error
> {code}
> select * from et1;
> 15/06/25 13:43:44 [main]: ERROR parse.CalcitePlanner: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
> hdfs://remote-nn/tmp/et1is encrypted: java.lang.IllegalArgumentException: 
> Wrong FS: hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1763)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:190)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
>   at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
>   at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1097)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1759)
>   ... 25 more
> FAILED: SemanticException Unable to determine if hdfs://remote-nn/tmp/et1is 
> encrypted: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-nn/tmp/et1, expected: hdfs://localhost:8020
> 15/06/25 13:43:44 [main]: ERROR ql.Driver: FAILED: SemanticException Unable 
> to determine if hdfs://remot

[jira] [Updated] (HIVE-14380) Queries on tables with remote HDFS paths fail in "encryption" checks.

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14380:

Description: 
If a table has table/partition locations set to remote HDFS paths, querying 
them will cause the following IAException:

{noformat}
2016-07-26 01:16:27,471 ERROR parse.CalcitePlanner 
(SemanticAnalyzer.java:getMetaData(1867)) - 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table is encrypted: 
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table, expected: 
hdfs://bar.ygrid.yahoo.com:8020
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2204)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:2274)
...
{noformat}

This is because of the following code in {{SessionState}}:
{code:title=SessionState.java|borderStyle=solid}
 public HadoopShims.HdfsEncryptionShim getHdfsEncryptionShim() throws 
HiveException {
if (hdfsEncryptionShim == null) {
  try {
FileSystem fs = FileSystem.get(sessionConf);
if ("hdfs".equals(fs.getUri().getScheme())) {
  hdfsEncryptionShim = 
ShimLoader.getHadoopShims().createHdfsEncryptionShim(fs, sessionConf);
} else {
  LOG.debug("Could not get hdfsEncryptionShim, it is only applicable to 
hdfs filesystem.");
}
  } catch (Exception e) {
throw new HiveException(e);
  }
}

return hdfsEncryptionShim;
  }
{code}

When the {{FileSystem}} instance is created, using the {{sessionConf}} implies 
that the current HDFS is going to be used. This call should instead fetch the 
{{FileSystem}} instance corresponding to the path being checked.

A fix is forthcoming...

(Note to self: YHIVE-860)

  was:
If a table has table/partition locations set to remote HDFS paths, querying 
them will cause the following IAException:

{noformat}
2016-07-26 01:16:27,471 ERROR parse.CalcitePlanner 
(SemanticAnalyzer.java:getMetaData(1867)) - 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table is encrypted: 
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table, expected: 
hdfs://bar.ygrid.yahoo.com:8020
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2204)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:2274)
...
{noformat}

This is because of the following code in {{SessionState}}:
{code:title=SessionState.java|borderStyle=solid}
 public HadoopShims.HdfsEncryptionShim getHdfsEncryptionShim() throws 
HiveException {
if (hdfsEncryptionShim == null) {
  try {
FileSystem fs = FileSystem.get(sessionConf);
if ("hdfs".equals(fs.getUri().getScheme())) {
  hdfsEncryptionShim = 
ShimLoader.getHadoopShims().createHdfsEncryptionShim(fs, sessionConf);
} else {
  LOG.debug("Could not get hdfsEncryptionShim, it is only applicable to 
hdfs filesystem.");
}
  } catch (Exception e) {
throw new HiveException(e);
  }
}

return hdfsEncryptionShim;
  }
{code}

When the {{FileSystem}} instance is created, using the {{sessionConf}} implies 
that the current HDFS is going to be used. This call should instead fetch the 
{{FileSystem}} instance corresponding to the path being checked.

A fix is forthcoming...


> Queries on tables with remote HDFS paths fail in "encryption" checks.
> -
>
> Key: HIVE-14380
> URL: https://issues.apache.org/jira/browse/HIVE-14380
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 2.2.0
>
> Attachments: HIVE-14380.1.patch, HIVE-14380.branch-1.2.patch
>
>
> If a table has table/partition locations set to remote HDFS paths, querying 
> them will cause the following IAException:
> {noformat}
> 2016-07-26 01:16:27,471 ERROR parse.CalcitePlanner 
> (SemanticAnalyzer.java:getMetaData(1867)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table is encrypted: 
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table, expected: 
> hdfs://bar.ygrid.yahoo.com:8020
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.j

[jira] [Assigned] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17794:
---


> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:90)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Status: Patch Available  (was: Open)

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:90)
> at 

[jira] [Updated] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17794:

Attachment: HIVE-17794.1.patch

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:90)
> at 
> org.

[jira] [Resolved] (HIVE-16874) qurey fail when try to read file from remote hdfs

2017-10-12 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan resolved HIVE-16874.
-
Resolution: Duplicate

Superb. Thanks for confirming, [~daniel.yj.zh...@gmail.com]. I'll close this 
JIRA.

> qurey fail when try to read file from remote hdfs
> -
>
> Key: HIVE-16874
> URL: https://issues.apache.org/jira/browse/HIVE-16874
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 1.2.1
>Reporter: Yunjian Zhang
> Attachments: HIVE-6.ext.patch
>
>
> as per an extend issue on HIVE-6, table join and insert on remote hdfs 
> storage will fail with same issue.
> batch base on 
> https://issues.apache.org/jira/secure/attachment/12820392/HIVE-6.1.patch, 
> attached patch will fix the issues mentioned here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Status: Open  (was: Patch Available)

Temporarily removing the {{branch-2.2}} patch, to get {{branch-2}} tests to run.

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Attachment: (was: HIVE-17791.1-branch-2.2.patch)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Status: Patch Available  (was: Open)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17802:
---


> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17803) With Pig multi-query, 2 HCatStorers writing to the same table will trample each other's outputs

2017-10-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17803:
---


> With Pig multi-query, 2 HCatStorers writing to the same table will trample 
> each other's outputs
> ---
>
> Key: HIVE-17803
> URL: https://issues.apache.org/jira/browse/HIVE-17803
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> When Pig scripts use multi-query and {{HCatStorer}} with 
> dynamic-partitioning, and use more than one {{HCatStorer}} instance to write 
> to the same table, they might trample on each other's outputs. The failure 
> looks as follows:
> {noformat}
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2006 : Error 
> adding partition to metastore. Cause : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1022)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:269)
>   ... 20 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1394)
>   at 
> org.apache.hadoop.ipc.ProtobufRp

[jira] [Updated] (HIVE-17803) With Pig multi-query, 2 HCatStorers writing to the same table will trample each other's outputs

2017-10-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17803:

Status: Patch Available  (was: Open)

> With Pig multi-query, 2 HCatStorers writing to the same table will trample 
> each other's outputs
> ---
>
> Key: HIVE-17803
> URL: https://issues.apache.org/jira/browse/HIVE-17803
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17803.1.patch
>
>
> When Pig scripts use multi-query and {{HCatStorer}} with 
> dynamic-partitioning, and use more than one {{HCatStorer}} instance to write 
> to the same table, they might trample on each other's outputs. The failure 
> looks as follows:
> {noformat}
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2006 : Error 
> adding partition to metastore. Cause : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1022)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:269)
>   ... 20 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ip

[jira] [Updated] (HIVE-17803) With Pig multi-query, 2 HCatStorers writing to the same table will trample each other's outputs

2017-10-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17803:

Attachment: HIVE-17803.1.patch

> With Pig multi-query, 2 HCatStorers writing to the same table will trample 
> each other's outputs
> ---
>
> Key: HIVE-17803
> URL: https://issues.apache.org/jira/browse/HIVE-17803
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17803.1.patch
>
>
> When Pig scripts use multi-query and {{HCatStorer}} with 
> dynamic-partitioning, and use more than one {{HCatStorer}} instance to write 
> to the same table, they might trample on each other's outputs. The failure 
> looks as follows:
> {noformat}
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2006 : Error 
> adding partition to metastore. Cause : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1022)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:269)
>   ... 20 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Clie

[jira] [Commented] (HIVE-17803) With Pig multi-query, 2 HCatStorers writing to the same table will trample each other's outputs

2017-10-16 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206263#comment-16206263
 ] 

Mithun Radhakrishnan commented on HIVE-17803:
-

The failing tests are old hat. 

+1. Checking in shortly.

> With Pig multi-query, 2 HCatStorers writing to the same table will trample 
> each other's outputs
> ---
>
> Key: HIVE-17803
> URL: https://issues.apache.org/jira/browse/HIVE-17803
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17803.1.patch
>
>
> When Pig scripts use multi-query and {{HCatStorer}} with 
> dynamic-partitioning, and use more than one {{HCatStorer}} instance to write 
> to the same table, they might trample on each other's outputs. The failure 
> looks as follows:
> {noformat}
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2006 : Error 
> adding partition to metastore. Cause : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1022)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:269)
>   ... 20 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at org.apac

[jira] [Updated] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17802:

Status: Patch Available  (was: Open)

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17802:

Attachment: HIVE-17802.1.patch

Cumulative patch with HIVE-17802, HIVE-17803, and part of HIVE-13989.

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-17 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207980#comment-16207980
 ] 

Mithun Radhakrishnan commented on HIVE-17802:
-

Back to the drawing board. :/ I'll check {{TestHCatMultiOutputFormat}} and 
{{TestHCatOutputFormat}}.

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-17 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17802:

Attachment: HIVE-17802.2.patch

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch, HIVE-17802.2.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17747) HMS DropTableMessage should include the full table object

2017-10-18 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209810#comment-16209810
 ] 

Mithun Radhakrishnan commented on HIVE-17747:
-

I'm regret being late to this party, I truly do.

The code in {{hive/metastore/messaging}} mirrors that in 
{{hcatalog/messaging}}, which uses sparse messages. The entire motivation 
behind keeping the messages devoid of the actual Thrift object ({{Database}}, 
{{Table}}, {{Partition}}, {{StorageDescriptor}}) is that they bloat the size of 
the message. It is not uncommon for us in production to have >1000 partitions 
added in a single {{addPartitions()}} call. {{Partition}} objects tend to 
possess a lot of redundant information, causing the message size to become 
enormous.

It looks like that ship sailed with HIVE-15522. Your solution here is 
consistent with HIVE-15522, so it should stand. :/ I'll raise a separate JIRA 
to deal with sparser messages.

> HMS DropTableMessage should include the full table object
> -
>
> Key: HIVE-17747
> URL: https://issues.apache.org/jira/browse/HIVE-17747
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 2.3.0
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17747.0.patch
>
>
> I have a notification log follower use-case which requires accessing the 
> parameters of dropped tables, so it would be useful if the {{DROP_TABLE}} 
> events in the notification log included the full table object, as the create 
> and alter events do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17802:

Attachment: HIVE-17802.2-branch-2.patch

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch, HIVE-17802.2-branch-2.patch, 
> HIVE-17802.2.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17802:

Attachment: (was: HIVE-17802.2-branch-2.patch)

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch, HIVE-17802.2.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15522) REPL LOAD & DUMP support for incremental ALTER_TABLE/ALTER_PTN including renames

2017-10-18 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210115#comment-16210115
 ] 

Mithun Radhakrishnan commented on HIVE-15522:
-

With this change, {{AddPartitionMessage}} includes the full-fat 
{{api.Partition}} objects. At least in production at Yahoo, that number can run 
into the thousands. This isn't as big a problem for {{CreateTableMessage}}, 
{{CreateDatabaseMessage}}, etc. Depending on how the messages are transmitted 
(e.g. over JMS), the message-size can pose a challenge.
Note that the corresponding code in HCatalog leaves out the Thrift objects. 
I'll raise a separate JIRA to see if we can find middle-ground, with regard to 
the Thrift objects.

> REPL LOAD & DUMP support for incremental ALTER_TABLE/ALTER_PTN including 
> renames
> 
>
> Key: HIVE-15522
> URL: https://issues.apache.org/jira/browse/HIVE-15522
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 2.2.0
>
> Attachments: HIVE-15522.2.patch, HIVE-15522.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-18 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17802:

Attachment: HIVE-17802.2-branch-2.patch

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch, HIVE-17802.2-branch-2.patch, 
> HIVE-17802.2.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-18 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210240#comment-16210240
 ] 

Mithun Radhakrishnan commented on HIVE-17802:
-

I've uploaded the {{branch-2.2}} patch that's likely to break 
{{itests/TestExtendedAcls}}. We might have to sort out the call to 
{{FileSystem.setOwner()}} to modify the group, in 
{{FileOutputCommitterContainer::applyGroupAndPerms()}}.

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch, HIVE-17802.2-branch-2.patch, 
> HIVE-17802.2.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17802) Remove unnecessary calls to FileSystem.setOwner() from FileOutputCommitterContainer

2017-10-18 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210240#comment-16210240
 ] 

Mithun Radhakrishnan edited comment on HIVE-17802 at 10/18/17 11:04 PM:


I've uploaded the {{branch-2}} patch that's likely to break 
{{itests/TestExtendedAcls}}. We might have to sort out the call to 
{{FileSystem.setOwner()}} to modify the group, in 
{{FileOutputCommitterContainer::applyGroupAndPerms()}}.


was (Author: mithun):
I've uploaded the {{branch-2.2}} patch that's likely to break 
{{itests/TestExtendedAcls}}. We might have to sort out the call to 
{{FileSystem.setOwner()}} to modify the group, in 
{{FileOutputCommitterContainer::applyGroupAndPerms()}}.

> Remove unnecessary calls to FileSystem.setOwner() from 
> FileOutputCommitterContainer
> ---
>
> Key: HIVE-17802
> URL: https://issues.apache.org/jira/browse/HIVE-17802
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17802.1.patch, HIVE-17802.2-branch-2.patch, 
> HIVE-17802.2.patch
>
>
> For large Pig/HCat queries that produce a large number of 
> partitions/directories/files, we have seen cases where the HDFS NameNode 
> groaned under the weight of {{FileSystem.setOwner()}} calls, originating from 
> the commit-step. This was the result of the following code in 
> FileOutputCommitterContainer:
> {code:java}
> private void applyGroupAndPerms(FileSystem fs, Path dir, FsPermission 
> permission,
>   List acls, String group, boolean recursive)
> throws IOException {
> ...
> if (recursive) {
>   for (FileStatus fileStatus : fs.listStatus(dir)) {
> if (fileStatus.isDir()) {
>   applyGroupAndPerms(fs, fileStatus.getPath(), permission, acls, 
> group, true);
> } else {
>   fs.setPermission(fileStatus.getPath(), permission);
>   chown(fs, fileStatus.getPath(), group);
> }
>   }
> }
>   }
>   private void chown(FileSystem fs, Path file, String group) throws 
> IOException {
> try {
>   fs.setOwner(file, null, group);
> } catch (AccessControlException ignore) {
>   // Some users have wrong table group, ignore it.
>   LOG.warn("Failed to change group of partition directories/files: " + 
> file, ignore);
> }
>   }
> {code}
> One call per file/directory is far too many. We have a patch that reduces the 
> namenode pressure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211391#comment-16211391
 ] 

Mithun Radhakrishnan commented on HIVE-17781:
-

+1. Tests pass. Rebasing before commit.


> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17781.1.patch
>
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.
> (Note to self: YHIVE-883)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17781:

Attachment: HIVE-17781.2.patch

> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17781.1.patch, HIVE-17781.2.patch
>
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.
> (Note to self: YHIVE-883)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17781:

Attachment: HIVE-17781.2-branch-2.2.patch
HIVE-17781.2-branch-2.patch

> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17781.1.patch, HIVE-17781.2-branch-2.2.patch, 
> HIVE-17781.2-branch-2.patch, HIVE-17781.2.patch
>
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.
> (Note to self: YHIVE-883)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Status: Open  (was: Patch Available)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Attachment: HIVE-17791.2-branch-2.patch

Rebasing the patch.

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

Status: Patch Available  (was: Open)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211426#comment-16211426
 ] 

Mithun Radhakrishnan commented on HIVE-17791:
-

+1 on this change. I'll try squeeze in another run of {{branch-2}} unit-tests 
on this.

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17803) With Pig multi-query, 2 HCatStorers writing to the same table will trample each other's outputs

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211448#comment-16211448
 ] 

Mithun Radhakrishnan commented on HIVE-17803:
-

Checked into {{master}}, {{branch-2}}, and {{branch-2.2}}. Thanks, [~cdrome].

> With Pig multi-query, 2 HCatStorers writing to the same table will trample 
> each other's outputs
> ---
>
> Key: HIVE-17803
> URL: https://issues.apache.org/jira/browse/HIVE-17803
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17803.1.patch
>
>
> When Pig scripts use multi-query and {{HCatStorer}} with 
> dynamic-partitioning, and use more than one {{HCatStorer}} instance to write 
> to the same table, they might trample on each other's outputs. The failure 
> looks as follows:
> {noformat}
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2006 : Error 
> adding partition to metastore. Cause : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1022)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:269)
>   ... 20 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.ha

[jira] [Updated] (HIVE-17803) With Pig multi-query, 2 HCatStorers writing to the same table will trample each other's outputs

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17803:

   Resolution: Fixed
Fix Version/s: 2.2.1
   2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> With Pig multi-query, 2 HCatStorers writing to the same table will trample 
> each other's outputs
> ---
>
> Key: HIVE-17803
> URL: https://issues.apache.org/jira/browse/HIVE-17803
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17803.1.patch
>
>
> When Pig scripts use multi-query and {{HCatStorer}} with 
> dynamic-partitioning, and use more than one {{HCatStorer}} instance to write 
> to the same table, they might trample on each other's outputs. The failure 
> looks as follows:
> {noformat}
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2006 : Error 
> adding partition to metastore. Cause : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1022)
>   at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:269)
>   ... 20 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /projects/foo/bar/activity_date=2016022306/_placeholder (inode 
> 2878224200): File does not exist. [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1281544466_4952, pendingcreates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3429)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3517)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:791)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:537)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server.call(Server.java:2267)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   

[jira] [Assigned] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17853:
---


> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithunr}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-10-19 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17853:

Description: 
The {{RetryingMetaStoreClient}} is used to automatically reconnect to the Hive 
metastore, after client timeout, transparently to the user.

In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating a 
Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find that 
the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
metastore operations will be attempted as the login-user ({{oozie}}), as 
opposed to the effective user ({{mithun}}).

We should have a fix for this shortly.

  was:
The {{RetryingMetaStoreClient}} is used to automatically reconnect to the Hive 
metastore, after client timeout, transparently to the user.

In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating a 
Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find that 
the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
metastore operations will be attempted as the login-user ({{oozie}}), as 
opposed to the effective user ({{mithunr}}).

We should have a fix for this shortly.


> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17781) Map MR settings to Tez settings via DeprecatedKeys

2017-10-23 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17781:

   Resolution: Fixed
Fix Version/s: 2.2.1
   2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}. Thanks, [~cdrome].

> Map MR settings to Tez settings via DeprecatedKeys
> --
>
> Key: HIVE-17781
> URL: https://issues.apache.org/jira/browse/HIVE-17781
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Tez
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-17781.1.patch, HIVE-17781.2-branch-2.2.patch, 
> HIVE-17781.2-branch-2.patch, HIVE-17781.2.patch
>
>
> Here's one that [~cdrome] and [~thiruvel] worked on:
> We found that certain Hadoop Map/Reduce settings that are set in site config 
> files do not take effect in Hive jobs, because the Tez site configs do not 
> contain the same settings.
> In Yahoo's case, the problem was that, at the time, there was no mapping 
> between {{MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART}} and 
> {{TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION}}. There were situations where 
> significant capacity on production clusters were being used up doing nothing, 
> while waiting for slow tasks to complete. This would have been avoided, were 
> the mappings in place.
> Tez provides a {{DeprecatedKeys}} utility class, to help map MR settings to 
> Tez settings. Hive should use this to ensure that the mappings are in sync.
> (Note to self: YHIVE-883)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17600) Make OrcFile's "enforceBufferSize" user-settable.

2017-10-26 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220979#comment-16220979
 ] 

Mithun Radhakrishnan commented on HIVE-17600:
-

Hey, [~prasanth_j]. Sorry I missed your reply. I was hoping to fix this 
specifically for Hive {{branch-2.2}}, so that I might base a release off of 
this branch. :]

> Make OrcFile's "enforceBufferSize" user-settable.
> -
>
> Key: HIVE-17600
> URL: https://issues.apache.org/jira/browse/HIVE-17600
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17600.1-branch-2.2.patch
>
>
> This is a duplicate of ORC-238, but it applies to {{branch-2.2}}.
> Compression buffer-sizes in OrcFile are computed at runtime, except when 
> enforceBufferSize is set. The only snag here is that this flag can't be set 
> by the user.
> When runtime-computed buffer-sizes are not optimal (for some reason), the 
> user has no way to work around it by setting a custom value.
> I have a patch that we use at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-26 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221299#comment-16221299
 ] 

Mithun Radhakrishnan commented on HIVE-17791:
-

Committed to {{branch-2}}, and {{branch-2.2}}. Thanks, [~cdrome].

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-26 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17791:

   Resolution: Fixed
Fix Version/s: 2.2.1
   2.4.0
   Status: Resolved  (was: Patch Available)

> Temp dirs under the staging directory should honour `inheritPerms`
> --
>
> Key: HIVE-17791
> URL: https://issues.apache.org/jira/browse/HIVE-17791
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 2.4.0, 2.2.1
>
> Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch
>
>
> For [~cdrome]:
> CLI creates two levels of staging directories but calls setPermissions on the 
> top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.
> The top-level directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
>  is created the first time {{Context.getExternalTmpPath}} is called.
> The child directory, 
> {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
>  is created when {{TezTask.execute}} is called at line 164:
> {code:java}
> DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
> {code}
> This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:
> {code:java}
> 3770   private static void createTmpDirs(Configuration conf,
> 3771   List> ops) throws IOException {
> 3772 
> 3773 while (!ops.isEmpty()) {
> 3774   Operator op = ops.remove(0);
> 3775 
> 3776   if (op instanceof FileSinkOperator) {
> 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
> 3778 Path tempDir = fdesc.getDirName();
> 3779 
> 3780 if (tempDir != null) {
> 3781   Path tempPath = Utilities.toTempPath(tempDir);
> 3782   FileSystem fs = tempPath.getFileSystem(conf);
> 3783   fs.mkdirs(tempPath); // <-- HERE!
> 3784 }
> 3785   }
> 3786 
> 3787   if (op.getChildOperators() != null) {
> 3788 ops.addAll(op.getChildOperators());
> 3789   }
> 3790 }
> 3791   }
> {code}
> It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll 
> rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to 
> wait till the issues around {{StorageBasedAuthProvider}}, directory 
> permissions, etc. are sorted out.
> (Note to self: YHIVE-857)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-10024) LLAP: q file test is broken again

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225540#comment-16225540
 ] 

Mithun Radhakrishnan commented on HIVE-10024:
-

I have put this off long enough. The sparse summary and description belie the 
significance of this fix. It turns out that this wasn't a fix for test-failures 
at all.

Before this fix, in cases where the last row-group for an ORC stripe contained 
fewer records than {{$\{orc.row.index.stride\}}}, and when predicate pushdown 
is enabled, one sees the following sort of failure:

{noformat}
 java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 
130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
column 82 kind DATA to 130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
{noformat}

It's a fairly rare case, but leads to bad reads on valid ORC files. This fix is 
available in {{branch-2}} and forward, but not in {{branch-1}}.

> LLAP: q file test is broken again
> -
>
> Key: HIVE-10024
> URL: https://issues.apache.org/jira/browse/HIVE-10024
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-10024) LLAP: q file test is broken again

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225540#comment-16225540
 ] 

Mithun Radhakrishnan edited comment on HIVE-10024 at 10/30/17 6:53 PM:
---

(I have put this off long enough. Sorry for the delay in updating this JIRA. We 
ran into this a good while back in production.) The sparse summary and 
description belie the significance of this fix. It turns out that this wasn't a 
fix for test-failures at all.

Before this fix, in cases where the last row-group for an ORC stripe contained 
fewer records than {{$\{orc.row.index.stride\}}}, and when predicate pushdown 
is enabled, one sees the following sort of failure:

{noformat}
 java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 
130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
column 82 kind DATA to 130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
{noformat}

It's a fairly rare case, but leads to bad reads on valid ORC files. This fix is 
available in {{branch-2}} and forward, but not in {{branch-1}}.


was (Author: mithun):
I have put this off long enough. The sparse summary and description belie the 
significance of this fix. It turns out that this wasn't a fix for test-failures 
at all.

Before this fix, in cases where the last row-group for an ORC stripe contained 
fewer records than {{$\{orc.row.index.stride\}}}, and when predicate pushdown 
is enabled, one sees the following sort of failure:

{noformat}
 java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 
130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
column 82 kind DATA to 130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more

[jira] [Commented] (HIVE-10024) LLAP: q file test is broken again

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225670#comment-16225670
 ] 

Mithun Radhakrishnan commented on HIVE-10024:
-

bq. this was a branch fix so I didn't think it affected a released version.
Right, that makes sense. We figured as much from the {{Fix version/s}}. I 
thought I'd include the stack-trace here, for anyone who might have run into a 
similar problem.

bq. Feel free to backport (on separate jira given its age ;))
Roger that. I'll port this back to {{branch-1}}, and {{branch-1.2}}. Thanks for 
this fix, [~sershe]!

> LLAP: q file test is broken again
> -
>
> Key: HIVE-10024
> URL: https://issues.apache.org/jira/browse/HIVE-10024
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17940:
---


> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key: HIVE-17940
> URL: https://issues.apache.org/jira/browse/HIVE-17940
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.2.2, 1.3.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> (This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)
> When the last row-group in an ORC stripe contains fewer records than 
> specified in {{\$\{orc.row.index.stride\}}}, and if a column value is sparse 
> (i.e. mostly nulls), then one sees the following failure when reading the ORC 
> stripe:
> {noformat}
>  java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA 
> to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
> column 82 kind DATA to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> ... 14 more
> {noformat}
> [~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running 
> into this in production with {{branch-1}}+, we find that the fix for 
> HIVE-10024 sorts this out in {{branch-1}} as well.
> This is a fairly rare case, but it leads to bad reads on valid ORC files. I 
> will back-port this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17940:

Description: 
(This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)

When the last row-group in an ORC stripe contains fewer records than specified 
in {{$\{orc.row.index.stride\}}}, and if a column value is sparse (i.e. mostly 
nulls), then one sees the following failure when reading the ORC stripe:

{noformat}
 java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 
130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
column 82 kind DATA to 130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
{noformat}

[~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running into 
this in production with {{branch-1}}+, we find that the fix for HIVE-10024 
sorts this out in {{branch-1}} as well.

This is a fairly rare case, but it leads to bad reads on valid ORC files. I 
will back-port this shortly.

  was:
(This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)

When the last row-group in an ORC stripe contains fewer records than specified 
in {{\$\{orc.row.index.stride\}}}, and if a column value is sparse (i.e. mostly 
nulls), then one sees the following failure when reading the ORC stripe:

{noformat}
 java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 
130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
column 82 kind DATA to 130 is outside of the data
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
{noformat}

[~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running into 
this in production with {{branch-1}}+, we find that the fix for HIVE-10024 
sorts this out in {{b

[jira] [Updated] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17940:

Attachment: HIVE-17940.1-branch-1.patch

> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key: HIVE-17940
> URL: https://issues.apache.org/jira/browse/HIVE-17940
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 1.2.2
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17940.1-branch-1.patch
>
>
> (This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)
> When the last row-group in an ORC stripe contains fewer records than 
> specified in {{$\{orc.row.index.stride\}}}, and if a column value is sparse 
> (i.e. mostly nulls), then one sees the following failure when reading the ORC 
> stripe:
> {noformat}
>  java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA 
> to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
> column 82 kind DATA to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> ... 14 more
> {noformat}
> [~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running 
> into this in production with {{branch-1}}+, we find that the fix for 
> HIVE-10024 sorts this out in {{branch-1}} as well.
> This is a fairly rare case, but it leads to bad reads on valid ORC files. I 
> will back-port this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17940:

Status: Patch Available  (was: Open)

> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key: HIVE-17940
> URL: https://issues.apache.org/jira/browse/HIVE-17940
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.2.2, 1.3.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17940.1-branch-1.patch
>
>
> (This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)
> When the last row-group in an ORC stripe contains fewer records than 
> specified in {{$\{orc.row.index.stride\}}}, and if a column value is sparse 
> (i.e. mostly nulls), then one sees the following failure when reading the ORC 
> stripe:
> {noformat}
>  java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA 
> to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
> column 82 kind DATA to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> ... 14 more
> {noformat}
> [~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running 
> into this in production with {{branch-1}}+, we find that the fix for 
> HIVE-10024 sorts this out in {{branch-1}} as well.
> This is a fairly rare case, but it leads to bad reads on valid ORC files. I 
> will back-port this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17940:

Attachment: HIVE-17940.1-branch-1.2.patch

> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key: HIVE-17940
> URL: https://issues.apache.org/jira/browse/HIVE-17940
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 1.2.2
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17940.1-branch-1.2.patch, 
> HIVE-17940.1-branch-1.patch
>
>
> (This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)
> When the last row-group in an ORC stripe contains fewer records than 
> specified in {{$\{orc.row.index.stride\}}}, and if a column value is sparse 
> (i.e. mostly nulls), then one sees the following failure when reading the ORC 
> stripe:
> {noformat}
>  java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA 
> to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
> column 82 kind DATA to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> ... 14 more
> {noformat}
> [~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running 
> into this in production with {{branch-1}}+, we find that the fix for 
> HIVE-10024 sorts this out in {{branch-1}} as well.
> This is a fairly rare case, but it leads to bad reads on valid ORC files. I 
> will back-port this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227177#comment-16227177
 ] 

Mithun Radhakrishnan commented on HIVE-17940:
-

How strange... {{branch-1.2}} builds on my box. I'll check this patch again.

> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key: HIVE-17940
> URL: https://issues.apache.org/jira/browse/HIVE-17940
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 1.2.2
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17940.1-branch-1.2.patch, 
> HIVE-17940.1-branch-1.patch
>
>
> (This is a backport of HIVE-10024 to {{branch-1.2}}, and {{branch-1}}.)
> When the last row-group in an ORC stripe contains fewer records than 
> specified in {{$\{orc.row.index.stride\}}}, and if a column value is sparse 
> (i.e. mostly nulls), then one sees the following failure when reading the ORC 
> stripe:
> {noformat}
>  java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA 
> to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for 
> column 82 kind DATA to 130 is outside of the data
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> ... 14 more
> {noformat}
> [~sershe] had a fix for this in HIVE-10024, in {{branch-2}}. After running 
> into this in production with {{branch-1}}+, we find that the fix for 
> HIVE-10024 sorts this out in {{branch-1}} as well.
> This is a fairly rare case, but it leads to bad reads on valid ORC files. I 
> will back-port this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227401#comment-16227401
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

[~vihangk1],

bq. All subsequent actions coming via HS2 should also do a doAs() using the 
sessionProxy. Is this happening in case of HCatalog...
Right. HS2 doesn't come into it, since this has more to do with {{HCatClient}}. 
The HCatalog APIs use {{HiveClientCache}} to amortize the cost of 
{{HiveMetaStoreClient}} construction and metastore connections.
Systems like Oozie/Falcon that use {{HCatClient}} to make metastore-calls 
within a {{doAs()}} context might land up losing their {{UGI.doAs()}} contexts 
after timeout, causing any retried actions to run as a privileged, rather than 
the impersonated user.

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17940) IllegalArgumentException when reading last row-group in an ORC stripe

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227419#comment-16227419
 ] 

Mithun Radhakrishnan commented on HIVE-17940:
-

bq. ... {{branch-1.2}} builds on my box.
I spoke too soon. Looks like {{branch-1.2}} is busted:

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hive-it-unit: Compilation failure: Compilation 
failure:
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[31,41]
 package org.apache.hadoop.hbase.zookeeper does not exist
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[42,11]
 cannot find symbol
[ERROR]   symbol:   class MiniZooKeeperCluster
[ERROR]   location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAdminUser.java:[41,12]
 cannot find symbol
[ERROR]   symbol:   method getPrivilege()
[ERROR]   location: class 
org.apache.hadoop.hive.metastore.api.HiveObjectPrivilege
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAdminUser.java:[42,75]
 cannot find symbol
[ERROR]   symbol:   method getRole()
[ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
 no suitable method found for 
updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
 is not applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
 incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
cannot be converted to boolean
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
 incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
cannot be converted to boolean
[ERROR] 
/Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
 cannot find symbol
[ERROR]   symbol:   class MiniZooKeeperCluster
[ERROR]   location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-it-unit
{noformat}

This is without HIVE-17940. I'll raise (yet) another JIRA to sort out the 
breakage. 

> IllegalArgumentException when reading last row-group in an ORC stripe
> -
>
> Key

[jira] [Assigned] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17949:
---


> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17949:

Attachment: HIVE-17949.01-branch-1.2.patch

Here's the fix. Could I please bother either of \[[~vgumashta], [~sershe]\] to 
review?

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17949:

Status: Patch Available  (was: Open)

Submitting for tests. 

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Attachment: HIVE-17467.2.patch

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Affects Version/s: 2.4.0
   3.0.0
   Status: Patch Available  (was: Open)

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17949) itests compile is busted on branch-1.2

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227758#comment-16227758
 ] 

Mithun Radhakrishnan commented on HIVE-17949:
-

Thanks for the review, [~sershe]. I'm just waiting for the tests to kick in, 
before I check this in.

> itests compile is busted on branch-1.2
> --
>
> Key: HIVE-17949
> URL: https://issues.apache.org/jira/browse/HIVE-17949
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 1.2.3
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17949.01-branch-1.2.patch
>
>
> {{commit 18ddf46e0a8f092358725fc102235cbe6ba3e24d}} on {{branch-1.2}} was for 
> {{Preparing for 1.2.3 development}}. This should have also included 
> corresponding changes to all the pom-files under {{itests}}. As it stands 
> now, the build fails with the following:
> {noformat}
> [ERROR]   location: class org.apache.hadoop.hive.metastore.api.Role
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:[512,19]
>  no suitable method found for 
> updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.api.Partition,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.updatePartitionStatsFast(org.apache.hadoop.hive.metastore.partition.spec.PartitionSpecProxy.PartitionIterator,org.apache.hadoop.hive.metastore.Warehouse,boolean,boolean,org.apache.hadoop.hive.metastore.api.EnvironmentContext)
>  is not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[181,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java:[190,45]
>  incompatible types: org.apache.hadoop.hive.metastore.api.EnvironmentContext 
> cannot be converted to boolean
> [ERROR] 
> /Users/mithunr/workspace/dev/hive/apache/branch-1.2/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR]   symbol:   class MiniZooKeeperCluster
> [ERROR]   location: class 
> org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-it-unit
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Attachment: HIVE-17467.2-branch-2.patch

Periodic rebase. :/

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1-branch-2.patch, HIVE-17467.1.patch, 
> HIVE-17467.2-branch-2.patch, HIVE-17467.2.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-10-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233534#comment-16233534
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

For the record, we've tested this out manually, using an Oozie setup, with 
user-impersonation. I suppose it might be possible to set up a unit-test using 
something based on {{MiniHiveKdc}}, but it looks non-trivial. Hmm...

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout

2017-11-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234314#comment-16234314
 ] 

Mithun Radhakrishnan commented on HIVE-17853:
-

[~vihangk1], would you be averse to our adding a unit-test in a followup JIRA?

> RetryingMetaStoreClient loses UGI impersonation-context when reconnecting 
> after timeout
> ---
>
> Key: HIVE-17853
> URL: https://issues.apache.org/jira/browse/HIVE-17853
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Critical
> Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch
>
>
> The {{RetryingMetaStoreClient}} is used to automatically reconnect to the 
> Hive metastore, after client timeout, transparently to the user.
> In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating 
> a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find 
> that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further 
> metastore operations will be attempted as the login-user ({{oozie}}), as 
> opposed to the effective user ({{mithun}}).
> We should have a fix for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


<    1   2   3   4   5   6   7   >