from:"Mithun Radhakrishnan \(JIRA\)"

[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-08-09 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.4.patch

Yet another patch for master.

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120303#comment-16120303
 ] 

Mithun Radhakrishnan edited comment on HIVE-8472 at 8/9/17 5:27 PM:


Thank you, [~leftylev].

bq. If you post your username here, I'll see it and set you up for editing.
I'd be most grateful if you'd grant permissions for {{mit...@apache.org}}.


was (Author: mithun):
Thank you, [~leftylev].

bq. If you post your username here, I'll see it and set you up for editing.
I'd be most grateful if you granted permissions for {{mit...@apache.org}}.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120303#comment-16120303
 ] 

Mithun Radhakrishnan commented on HIVE-8472:


Thank you, [~leftylev].

bq. If you post your username here, I'll see it and set you up for editing.
I'd be most grateful if you granted permissions for {{mit...@apache.org}}.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17275) Auto-merge fails on writes of UNION ALL output to ORC file with dynamic partitioning

2017-08-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120270#comment-16120270
 ] 

Mithun Radhakrishnan commented on HIVE-17275:
-

Thanks for the patch, [~cdrome]. +1.

> Auto-merge fails on writes of UNION ALL output to ORC file with dynamic 
> partitioning
> 
>
> Key: HIVE-17275
> URL: https://issues.apache.org/jira/browse/HIVE-17275
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-17275-branch-2.2.patch, HIVE-17275-branch-2.patch, 
> HIVE-17275.patch
>
>
> If dynamic partitioning is used to write the output of UNION or UNION ALL 
> queries into ORC files with hive.merge.tezfiles=true, the merge step fails as 
> follows:
> {noformat}
> 2017-08-08T11:27:19,958 ERROR [e7b1f06d-d632-408a-9dff-f7ae042cd25a main] 
> SessionState: Vertex failed, vertexName=File Merge, 
> vertexId=vertex_1502216690354_0001_33_00, diagnostics=[Task failed, 
> taskId=task_1502216690354_0001_33_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502216690354_0001_33_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:225)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:216)
>   ... 16 more
> Caused by: java.io.IOException: Multiple partitions for one merge mapper: 
>

[jira] [Comment Edited] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-08 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119041#comment-16119041
 ] 

Mithun Radhakrishnan edited comment on HIVE-16820 at 8/8/17 9:11 PM:
-

I've filed HIVE-17273 for extending this to {{MergeFileTask}}, so that I don't 
drop the ball on this. Thanks, [~sershe]!


was (Author: mithun):
I've filed HIVE-17273 for this, so that I don't drop the ball on this. Thanks, 
[~sershe]!

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-08 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119041#comment-16119041
 ] 

Mithun Radhakrishnan commented on HIVE-16820:
-

I've filed HIVE-17273 for this, so that I don't drop the ball on this. Thanks, 
[~sershe]!

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17273) MergeFileTask needs to be interruptible

2017-08-08 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17273:

Status: Patch Available  (was: Open)

> MergeFileTask needs to be interruptible
> ---
>
> Key: HIVE-17273
> URL: https://issues.apache.org/jira/browse/HIVE-17273
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17273.1.patch
>
>
> This is an extension to the work done in HIVE-16820 (which made {{TezTask}} 
> exit correctly when the job is cancelled.)
> If a Hive job involves a {{MergeFileTask}} (say {{ALTER TABLE ... PARTITION 
> ... CONCATENATE}}), and is cancelled *after* the merge-task has kicked off, 
> then the merge-task might not be cancelled, and might run through to 
> completion.
> The code should check if the merge-job has already been scheduled, and cancel 
> it if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17273) MergeFileTask needs to be interruptible

2017-08-08 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17273:

Attachment: HIVE-17273.1.patch

Here's the initial crack at a fix.

> MergeFileTask needs to be interruptible
> ---
>
> Key: HIVE-17273
> URL: https://issues.apache.org/jira/browse/HIVE-17273
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17273.1.patch
>
>
> This is an extension to the work done in HIVE-16820 (which made {{TezTask}} 
> exit correctly when the job is cancelled.)
> If a Hive job involves a {{MergeFileTask}} (say {{ALTER TABLE ... PARTITION 
> ... CONCATENATE}}), and is cancelled *after* the merge-task has kicked off, 
> then the merge-task might not be cancelled, and might run through to 
> completion.
> The code should check if the merge-job has already been scheduled, and cancel 
> it if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17273) MergeFileTask needs to be interruptible

2017-08-08 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17273:
---


> MergeFileTask needs to be interruptible
> ---
>
> Key: HIVE-17273
> URL: https://issues.apache.org/jira/browse/HIVE-17273
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> This is an extension to the work done in HIVE-16820 (which made {{TezTask}} 
> exit correctly when the job is cancelled.)
> If a Hive job involves a {{MergeFileTask}} (say {{ALTER TABLE ... PARTITION 
> ... CONCATENATE}}), and is cancelled *after* the merge-task has kicked off, 
> then the merge-task might not be cancelled, and might run through to 
> completion.
> The code should check if the merge-job has already been scheduled, and cancel 
> it if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17251) Remove usage of org.apache.pig.ResourceStatistics#setmBytes method in HCatLoader

2017-08-08 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118734#comment-16118734
 ] 

Mithun Radhakrishnan commented on HIVE-17251:
-

+1. This looks good. 

> Remove usage of org.apache.pig.ResourceStatistics#setmBytes method in 
> HCatLoader
> 
>
> Key: HIVE-17251
> URL: https://issues.apache.org/jira/browse/HIVE-17251
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: Nandor Kollar
>Assignee: Adam Szita
>Priority: Minor
> Attachments: HIVE-17251.0.patch
>
>
> org.apache.pig.ResourceStatistics#setmBytes is marked as deprecated, and is 
> going to be removed from Pig. Is it possible to use use the the proper 
> replacement method (ResourceStatistics#setSizeInBytes) instead?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-08 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118710#comment-16118710
 ] 

Mithun Radhakrishnan commented on HIVE-8472:


The test-failures are being dealt with in separate JIRAs. 

[~leftylev], unless I've completely missed it, it appears that I don't have 
EDIT rights on the documentation page. :/ Where might one apply for it?

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Affects Version/s: 3.0.0

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Status: Patch Available  (was: Open)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: HIVE-8472.3.patch

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: (was: HIVE-8472.3.patch)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Status: Open  (was: Patch Available)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: HIVE-8472.3.patch

Thank you, [~alangates]. I've made the changes you suggested.

[~leftylev], where might one make the documentation changes that [~alangates] 
mentions?

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.3.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Status: Open  (was: Patch Available)

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.3.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Attachment: HIVE-17181.3.patch

Yikes, that's a good point, [~thejas]. Here's the corrected patch.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.3.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-04 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Attachment: HIVE-17181.2.patch

Thanks for the review, [~thejas]. :] I've added a test.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-04 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Status: Open  (was: Patch Available)

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-04 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Status: Patch Available  (was: Open)

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-04 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reopened HIVE-17229:
-

Sorry, [~yuan_zac], there has been a misunderstanding. This issue is not 
resolved yet. I modified your patch to apply to master, properly. I'm reopening 
it to run tests.

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Zac Zhou
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-04 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17229:

Status: Patch Available  (was: Reopened)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Zac Zhou
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-14380) Queries on tables with remote HDFS paths fail in "encryption" checks.

2017-08-03 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-14380:

Attachment: HIVE-14380.branch-1.2.patch

Sorry I missed your message, [~cltlfcjin]. Here's a patch that applies to 
{{branch-1.2}}. I've had to pull in part of HIVE-7476 for this.

> Queries on tables with remote HDFS paths fail in "encryption" checks.
> -
>
> Key: HIVE-14380
> URL: https://issues.apache.org/jira/browse/HIVE-14380
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 2.2.0
>
> Attachments: HIVE-14380.1.patch, HIVE-14380.branch-1.2.patch
>
>
> If a table has table/partition locations set to remote HDFS paths, querying 
> them will cause the following IAException:
> {noformat}
> 2016-07-26 01:16:27,471 ERROR parse.CalcitePlanner 
> (SemanticAnalyzer.java:getMetaData(1867)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table is encrypted: 
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table, expected: 
> hdfs://bar.ygrid.yahoo.com:8020
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:2274)
> ...
> {noformat}
> This is because of the following code in {{SessionState}}:
> {code:title=SessionState.java|borderStyle=solid}
>  public HadoopShims.HdfsEncryptionShim getHdfsEncryptionShim() throws 
> HiveException {
> if (hdfsEncryptionShim == null) {
>   try {
> FileSystem fs = FileSystem.get(sessionConf);
> if ("hdfs".equals(fs.getUri().getScheme())) {
>   hdfsEncryptionShim = 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(fs, sessionConf);
> } else {
>   LOG.debug("Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.");
> }
>   } catch (Exception e) {
> throw new HiveException(e);
>   }
> }
> return hdfsEncryptionShim;
>   }
> {code}
> When the {{FileSystem}} instance is created, using the {{sessionConf}} 
> implies that the current HDFS is going to be used. This call should instead 
> fetch the {{FileSystem}} instance corresponding to the path being checked.
> A fix is forthcoming...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-03 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113354#comment-16113354
 ] 

Mithun Radhakrishnan commented on HIVE-16820:
-

bq. Yeah, what I was saying is that we don't need the same bugfix for that task 
...
Right. Sorry, I didn't mean to imply otherwise.

bq. It probably does need an implementation (without bugs like this).
I'm working on it. Our fix for this was not as complete/correct as yours. I'll 
extend this for {{MergeFileTask}}.

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-08-03 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113351#comment-16113351
 ] 

Mithun Radhakrishnan commented on HIVE-17188:
-

[~leftylev]: Thank you for correcting me. I've added 2.4, as per your advice.

bq. would probably make sense to merge this in branch-2.3 as well...
A fair point, [~vihangk1]. [~pxiong], do I have your permission to merge this 
into {{branch-2.3}}?

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 2.2.0, 3.0.0, 2.4.0
>
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-08-03 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

Fix Version/s: 2.4.0

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 2.2.0, 3.0.0, 2.4.0
>
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111753#comment-16111753
 ] 

Mithun Radhakrishnan commented on HIVE-15686:
-

For the record, this patch is only for {{branch-2}} and {{branch-2.2}}. 

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17233:

Status: Patch Available  (was: Open)

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17233:

Attachment: HIVE-17233.1.patch

The proposed fix.

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17233:
---

Assignee: Mithun Radhakrishnan

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Status: Patch Available  (was: Open)

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Attachment: HIVE-15686.branch-2.patch

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

 Assignee: Mithun Radhakrishnan
Affects Version/s: (was: 1.2.1)
   2.2.0
   Status: Open  (was: Patch Available)

Attaching patch for {{branch-2}}.

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Attachment: (was: HADOOP-14015.1.patch)

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Mithun Radhakrishnan
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111648#comment-16111648
 ] 

Mithun Radhakrishnan commented on HIVE-16820:
-

bq. MergeFileTask doesn't appear to implement shutdown at all.

:] Ah, but doesn't it need one? It is conceivable that a user might cancel a 
query between the TezTask and the MergeFileTask, (or simply interrupt an 
{{ALTER TABLE CONCATENATE}}). In that case, the merge will run through to the 
end, in spite of cancellation. 

I wonder if there isn't value in applying the HIVE-12556 + HIVE-16820 treatment 
for MergeFileTask as well.

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17229:
---

Assignee: Zac Zhou  (was: Mithun Radhakrishnan)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Zac Zhou
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17229:

Status: Patch Available  (was: Open)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17229:

Status: Open  (was: Patch Available)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17229:

Attachment: HIVE-17229.2.patch

Attempting to fix the patch, to have the tests run.

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17229:
---

Assignee: Mithun Radhakrishnan  (was: Zac Zhou)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

   Resolution: Fixed
Fix Version/s: 2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}, as advised by 
[~owen.omalley].

Thank you, [~cdrome], for the patch. Thanks, [~vihangk1], for the review.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-08-02 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111334#comment-16111334
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

bq. I am looking into the option of launching metastore in a different process.
Thank you for holding out for the proper solution to this, [~sbeeram]. A couple 
of challenges come to mind:
# The test will need to communicate which port the remote metastore was started 
at. We might consider pre-setting the port, but I wonder if this might make the 
test's start-up brittle.
# Cleanup for the remote metastore's state, after the test.

bq. sorry about being so anal about this...
I fear that label fits my stand more than yours. :/

In the spirit of detente, I'd be happy to revisit your first proposal, if the 
multi-process solution proves cumbersome.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-01 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110071#comment-16110071
 ] 

Mithun Radhakrishnan commented on HIVE-16820:
-

[~sershe], I wonder if a similar fix should go into 
{{MergeFileTask::execute()}}, to check for cancellation before job-submission. 

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-31 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108024#comment-16108024
 ] 

Mithun Radhakrishnan commented on HIVE-17188:
-

I'm +1 on this change. This has been running in production for some time now. 

I'd like to commit this to {{master}} and {{branch-2}}, tomorrow, unless there 
is objection.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.

2017-07-31 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17218:

Status: Patch Available  (was: Open)

> Canonical-ize hostnames for Hive metastore, and HS2 servers.
> 
>
> Key: HIVE-17218
> URL: https://issues.apache.org/jira/browse/HIVE-17218
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Security
>Affects Versions: 2.2.0, 1.2.2, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17218.1.patch
>
>
> Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not 
> canonical-ize the hostnames of the metastore/HS2 servers. In deployments 
> where there are multiple such servers behind a VIP, this causes a number of 
> inconveniences:
> # The client-side configuration (e.g. {{hive.metastore.uris}} in 
> {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a 
> simplified CNAME, in the thrift URL. If the 
> {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees 
> GSS failures as follows:
> {noformat}
> hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net 
> --hiveconf 
> hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789"
> ...
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> ...
> {noformat}
> This is because {{_HOST}} is filled in with the CNAME, and not the 
> canonicalized name.
> # Oozie workflows that use HCat {{}} have to always use the VIP 
> hostname, and can't use {{_HOST}}-based service principals, if the CNAME 
> differs from the VIP name.
> If the client-code simply canonical-ized the hostnames, it would enable the 
> use of both simplified CNAMEs, and _HOST in service principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.

2017-07-31 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17218:

Attachment: HIVE-17218.1.patch

Here's the proposed fix.

> Canonical-ize hostnames for Hive metastore, and HS2 servers.
> 
>
> Key: HIVE-17218
> URL: https://issues.apache.org/jira/browse/HIVE-17218
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Security
>Affects Versions: 1.2.2, 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17218.1.patch
>
>
> Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not 
> canonical-ize the hostnames of the metastore/HS2 servers. In deployments 
> where there are multiple such servers behind a VIP, this causes a number of 
> inconveniences:
> # The client-side configuration (e.g. {{hive.metastore.uris}} in 
> {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a 
> simplified CNAME, in the thrift URL. If the 
> {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees 
> GSS failures as follows:
> {noformat}
> hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net 
> --hiveconf 
> hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789"
> ...
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> ...
> {noformat}
> This is because {{_HOST}} is filled in with the CNAME, and not the 
> canonicalized name.
> # Oozie workflows that use HCat {{}} have to always use the VIP 
> hostname, and can't use {{_HOST}}-based service principals, if the CNAME 
> differs from the VIP name.
> If the client-code simply canonical-ized the hostnames, it would enable the 
> use of both simplified CNAMEs, and _HOST in service principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.

2017-07-31 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17218:
---


> Canonical-ize hostnames for Hive metastore, and HS2 servers.
> 
>
> Key: HIVE-17218
> URL: https://issues.apache.org/jira/browse/HIVE-17218
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Security
>Affects Versions: 2.2.0, 1.2.2, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not 
> canonical-ize the hostnames of the metastore/HS2 servers. In deployments 
> where there are multiple such servers behind a VIP, this causes a number of 
> inconveniences:
> # The client-side configuration (e.g. {{hive.metastore.uris}} in 
> {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a 
> simplified CNAME, in the thrift URL. If the 
> {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees 
> GSS failures as follows:
> {noformat}
> hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net 
> --hiveconf 
> hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789"
> ...
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> ...
> {noformat}
> This is because {{_HOST}} is filled in with the CNAME, and not the 
> canonicalized name.
> # Oozie workflows that use HCat {{}} have to always use the VIP 
> hostname, and can't use {{_HOST}}-based service principals, if the CNAME 
> differs from the VIP name.
> If the client-code simply canonical-ized the hostnames, it would enable the 
> use of both simplified CNAMEs, and _HOST in service principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-31 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107621#comment-16107621
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

After co-opting this JIRA so that I could provide an alternative fix, I've 
reset ownership back to [~sbeeram].

Rather than rewrite the test to launch the target metastore in a separate 
process, I have reworded the failing tests without changing their intention. In 
the places where calls to the source and target metastores are interleaved, I 
fetch a new HMS client.

Fixing the static-state in the metastore is a larger problem that should be 
addressed separately.

[~sbeeram], I wonder if you might find this satisfactory.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-31 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-16908:
---

Assignee: Sunitha Beeram  (was: Mithun Radhakrishnan)

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-16908:

Status: Open  (was: Patch Available)

Resubmitting, to run tests.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-16908:

Status: Patch Available  (was: Open)

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-16908:

Attachment: HIVE-16908.3.patch

Not my finest moment, but the following is the least obtrusive change to fix 
the problem, methinks. 

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105880#comment-16105880
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

I'm taking a crack at this failure. Assigning this to myself, for the moment, 
for the sake of uploading a patch.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-16908:
---

Assignee: Mithun Radhakrishnan  (was: Sunitha Beeram)

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105373#comment-16105373
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

[~sbeeram]: I have raised HIVE-17201 to temporarily disable these three tests, 
to clean up the tests for others. This is only temporary, until we fix the 
tests properly.

bq. If the target metastore instance were accessed through a different 
classloader...
I made an initial pass at doing this. I don't have a proper solution yet. Will 
update.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17201:

Status: Patch Available  (was: Open)

Submitting, to check the tests.

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17201:

Attachment: HIVE-17201.1.patch

The following should clean up the build, for now.

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17201:
---


> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104059#comment-16104059
 ] 

Mithun Radhakrishnan commented on HIVE-17169:
-

An additional reason to avoid {{KeyProvider::getMetadata()}} is that the HDFS 
might be set up to disallow this call for all but HDFS super-users. The 
{{EncryptionZone}} instance already provides what we need.

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

Status: Patch Available  (was: Open)

Submitting patch for tests.

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

Attachment: (was: HIVE-17169.branch-2.2.patch)

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104030#comment-16104030
 ] 

Mithun Radhakrishnan commented on HIVE-17188:
-

P.S. I've added clarification in the JIRA description.

We've had a rash of JIRAs with anaemic descriptions recently. I hope this 
version is more clear.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

Description: 
For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
runs out of memory. Flushing the {{PersistenceManager}} alleviates the problem.

Note: The problem being addressed here isn't so much with the size of the 
hundreds of Partition objects, but the cruft that builds with the 
PersistenceManager, in the JDO layer, as confirmed through memory-profiling.

(Raising this on behalf of [~cdrome] and [~thiruvel].)

  was:
For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
runs out of memory. Flushing the {{PersistenceManager}} alleviates the problem.

(Raising this on behalf of [~cdrome] and [~thiruvel].)


> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104014#comment-16104014
 ] 

Mithun Radhakrishnan commented on HIVE-17188:
-

@[~vihangk1]: Thank you for your attention. :]

bq. Can you please update the patch with HIVE specific JIRA number and 
description of this JIRA as per our convention?
Sorry, it's been a while, so perhaps you could clarify for me. My memory of the 
convention is that patches are named 
{{HIVE-..patch}}. If the patch is a port to 
another branch, then it's {{HIVE-..patch}}. 
>From perusing the JIRAs included in [the Hive 2.2 
>release|https://issues.apache.org/jira/projects/HIVE/versions/12335837], this 
>seems like the format of choice. Could you please clarify what I'm missing?

bq. You can add a line in the description where this patch was cherry-picked 
from I you like..
This is a port from Yahoo's internal production branch. The commit dates back 
to April of 2014. :]

bq. If there are hundreds of partitions being added, aren't they already in 
memory in the {{List}} parts object?
A fair question. :] I can try answer this, although [~cdrome] and [~thiruvel] 
are really the experts on this one. 
The problem being addressed here isn't so much with the size of the hundreds of 
{{Partition}} objects, but the cruft that builds with the 
{{PersistenceManager}}, in the JDO layer, as confirmed through memory-profiling.

Our larger commit also plugged leaks from neglecting to call 
{{Query::close()}}, etc. It looks like those have independently been solved 
already.


> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

Status: Patch Available  (was: Open)

Submitting, to run tests.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

Attachment: HIVE-17188.1.patch

Here's the patch ported for {{master/}}.

I wonder if it's better to flush at an interval, instead of for *every* 
partition.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17188:
---


> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Status: Open  (was: Patch Available)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Status: Patch Available  (was: Open)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-27 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: (was: HIVE-8472.branch-2.2.patch)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Affects Version/s: 2.2.0

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Affects Version/s: (was: 0.13.1)
   2.2.0
   Status: Patch Available  (was: Open)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.branch-2.2.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Status: Patch Available  (was: Open)

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Attachment: HIVE-17181.1.patch

Patch for master.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

Attachment: HIVE-17181.branch-2.patch

Patch for branch-2.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17181:
---


> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: HIVE-8472.branch-2.2.patch

And here's one for branch-2.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 0.13.1
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.branch-2.2.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: HIVE-8472.1.patch

Here's a patch for the master branch.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 0.13.1
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-26 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-8472:
--

Assignee: Mithun Radhakrishnan

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 0.13.1
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-25 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

Attachment: HIVE-17169.branch-2.2.patch

Patch for branch 2.2.

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch, HIVE-17169.branch-2.2.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used

2017-07-25 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100549#comment-16100549
 ] 

Mithun Radhakrishnan commented on HIVE-16844:
-

[~sbeeram], my apologies for the radio silence.

bq. ... let me know if you think its better to revert the change until this is 
sorted out.
Your argument regarding connection-leaks is (and was) a compelling one. Keeping 
the {{PersistenceManagementFactory}} open in lieu of addressing the concurrency 
issue would not be right. As such, perhaps it's best to keep this commit for 
the moment.

bq. ... we are pulling the plug on the connections while another thread is 
possibly using it.
I'll examine the code more closely to see if retries from other threads will go 
through. This code is nasty. :/

bq. I am just beginning to catch up on the implementation of 
HCatClientHMSImpl/HCatUtil and will need to look through a bit more through 
that code on the JDO configuration that will ultimately get used etc.
This isn't all that much to that side of the code. It functions purely as an 
HMSClient. I'm not sure that it has bearing on the JDO-config (unless I've 
missed something again).


> Fix Connection leak in ObjectStore when new Conf object is used
> ---
>
> Key: HIVE-16844
> URL: https://issues.apache.org/jira/browse/HIVE-16844
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) 
> connection pools when a new configuration object is passed in. The code needs 
> to ensure that the persistence-factory is closed before it is nullified.
> The relevant code is 
> [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
>  Note that pmf is set to null, but the underlying connection pool is not 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-25 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

Attachment: HIVE-17169.1.patch

Here's the proposed change.

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-25 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

Summary: Avoid extra call to KeyProvider::getMetadata()  (was: Avoid call 
to KeyProvider::getMetadata())

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17169) Avoid call to KeyProvider::getMetadata()

2017-07-25 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17169:
---


> Avoid call to KeyProvider::getMetadata()
> 
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-05 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075593#comment-16075593
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

Yes, this is what I was afraid of. The intention of 
{{testTableSchemaPropagation()}} was to simulate table-propagation across 
different clusters/HCat instances, as Apache Falcon (or similar projects) do. I 
wonder if this change dilutes that intention. :/ I do recognize that the static 
state in {{ObjectStore}} makes this problematic. I'm trying to figure out an 
alternative.

Question: If the target metastore instance were accessed through a different 
classloader, their states would be isolated, right? Would that be an acceptable 
solution?

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used

2017-07-05 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075583#comment-16075583
 ] 

Mithun Radhakrishnan commented on HIVE-16844:
-

Sorry to resurrect this discussion. I was pondering over the solution on 
HIVE-16908, and wondered whether the solution here is complete. Here's the code 
to {{ObjectStore::setConf()}}:

{code:java|title=ObjectStore.java}
  @Override
  @SuppressWarnings("nls")
  public void setConf(Configuration conf) {
// Although an instance of ObjectStore is accessed by one thread, there may
// be many threads with ObjectStore instances. So the static variables
// pmf and prop need to be protected with locks.
pmfPropLock.lock();
try {
  isInitialized = false;
  hiveConf = conf;
  configureSSL(conf);
  Properties propsFromConf = getDataSourceProps(conf);
  boolean propsChanged = !propsFromConf.equals(prop);

  if (propsChanged) {
if (pmf != null){
  clearOutPmfClassLoaderCache(pmf);
  // close the underlying connection pool to avoid leaks
  pmf.close();
}
pmf = null;
prop = null;
  }
...
  }
{code}

Note that {{pmfPropLock}} is locked before {{pmf.close()}} is called. But this 
is also the only place where {{pmfPropLock}} is used. So, if another thread is 
in the middle of accessing {{pmf}}, it is possible that the instance is messed 
up for that thread.
Before this code change, resetting {{pmf}} would not affect any threads with an 
outstanding reference.

> Fix Connection leak in ObjectStore when new Conf object is used
> ---
>
> Key: HIVE-16844
> URL: https://issues.apache.org/jira/browse/HIVE-16844
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) 
> connection pools when a new configuration object is passed in. The code needs 
> to ensure that the persistence-factory is closed before it is nullified.
> The relevant code is 
> [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
>  Note that pmf is set to null, but the underlying connection pool is not 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-05 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075241#comment-16075241
 ] 

Mithun Radhakrishnan edited comment on HIVE-16908 at 7/5/17 6:53 PM:
-

I'll need a little time to review. On the face of it, this change is 
disconcerting, since it looks like this changes the intention of the tests 
added in HIVE-7341. :/ Let me take a closer look.


was (Author: mithun):
I'll need a little time to review. On the face of it, this change is 
disconcerting, since it looks like changes the intention of the tests added in 
HIVE-7341. :/ Let me take a closer look.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-05 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075241#comment-16075241
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

I'll need a little time to review. On the face of it, this change is 
disconcerting, since it looks like changes the intention of the tests added in 
HIVE-7341. :/ Let me take a closer look.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16918) Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp

2017-06-19 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054961#comment-16054961
 ] 

Mithun Radhakrishnan commented on HIVE-16918:
-

Is the call to {{isLocalFile()}} the reason why you no longer need explicit 
handling for {{_metadata}} in {{ExportSemanticAnalyzer}}?

> Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp
> --
>
> Key: HIVE-16918
> URL: https://issues.apache.org/jira/browse/HIVE-16918
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 3.0.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-16918.2.patch, HIVE-16918.patch
>
>
> With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. 
> This, however, is incorrect for copying _metadata generated from a temporary 
> scratch directory to hdfs. We need to change that so that routes to using a 
> regular CopyTask. The issue with using distcp for this is that distcp 
> launches from another job which may be queued on another machine, which does 
> not have access to this file:// uri. Distcp should only ever be used when 
> copying from non-localfilesystems.
> Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a 
> default for invocations of distcp from hive. Adding that in. This would not 
> be necessary if HADOOP-8143 had made it in, but till it doesn't go in, we 
> need it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14380) Queries on tables with remote HDFS paths fail in "encryption" checks.

2017-06-16 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052223#comment-16052223
 ] 

Mithun Radhakrishnan commented on HIVE-14380:
-

bq. I have a related fix on the metastore server-side.

Looks like my fix for the {{HiveMetaStore}} is obviated by HIVE-14688. I'll 
leave my fix out.

> Queries on tables with remote HDFS paths fail in "encryption" checks.
> -
>
> Key: HIVE-14380
> URL: https://issues.apache.org/jira/browse/HIVE-14380
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 2.2.0
>
> Attachments: HIVE-14380.1.patch
>
>
> If a table has table/partition locations set to remote HDFS paths, querying 
> them will cause the following IAException:
> {noformat}
> 2016-07-26 01:16:27,471 ERROR parse.CalcitePlanner 
> (SemanticAnalyzer.java:getMetaData(1867)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table is encrypted: 
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table, expected: 
> hdfs://bar.ygrid.yahoo.com:8020
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:2274)
> ...
> {noformat}
> This is because of the following code in {{SessionState}}:
> {code:title=SessionState.java|borderStyle=solid}
>  public HadoopShims.HdfsEncryptionShim getHdfsEncryptionShim() throws 
> HiveException {
> if (hdfsEncryptionShim == null) {
>   try {
> FileSystem fs = FileSystem.get(sessionConf);
> if ("hdfs".equals(fs.getUri().getScheme())) {
>   hdfsEncryptionShim = 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(fs, sessionConf);
> } else {
>   LOG.debug("Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.");
> }
>   } catch (Exception e) {
> throw new HiveException(e);
>   }
> }
> return hdfsEncryptionShim;
>   }
> {code}
> When the {{FileSystem}} instance is created, using the {{sessionConf}} 
> implies that the current HDFS is going to be used. This call should instead 
> fetch the {{FileSystem}} instance corresponding to the path being checked.
> A fix is forthcoming...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14274) When columns are added to structs in a Hive table, HCatLoader breaks.

2017-02-21 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876489#comment-15876489
 ] 

Mithun Radhakrishnan commented on HIVE-14274:
-

bq. Any one any ideas? 

Yes. :] The patch I have up on this JIRA should allow for the addition of 
columns to the end of a struct. Supporting the general case will be a very 
large change to HCatalog.

> When columns are added to structs in a Hive table, HCatLoader breaks.
> -
>
> Key: HIVE-14274
> URL: https://issues.apache.org/jira/browse/HIVE-14274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14274.1.patch
>
>
> Consider this sequence of table/partition creation and schema evolution:
> {code:sql}
> -- Create table.
> CREATE EXTERNAL TABLE `simple_text` (
> foo STRING,
> bar STRUCT
> )
> PARTITIONED BY ( dt STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ':'
> STORED AS TEXTFILE ;
> -- Add partition.
> ALTER TABLE simple_text ADD PARTITION ( dt='0' );
> -- Alter the struct-column to add a new sub-field.
> ALTER TABLE simple_text CHANGE COLUMN bar bar STRUCT zoo:STRING>;
> {code}
> The {{dt='0'}} partition's schema indicates 2 fields in {{bar}}. The data can 
> be read using Hive, but not through HCatLoader. The error looks as follows:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: data_raw: 
> Store(hdfs://dilithiumblue-nn1.blue.ygrid.yahoo.com:8020/tmp/temp-643668868/tmp-1639945319:org.apache.pig.impl.io.TFileStorage)
>  - scope-1 Operator Key: scope-1): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:160)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
>   ... 16 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
> Error converting read value to tuple
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
>   at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:63)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:118)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140)
>   ...

[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Status: Patch Available  (was: Open)

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-15491:
---

Assignee: Mithun Radhakrishnan

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Attachment: HIVE-15491.patch

The patch.

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Affects Version/s: 2.1.1

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Description: 
I draw your attention to the following piece of code in 
{{GenericUDTFJSONTuple::process()}}:

{code:java}
  @Override
  public void process(Object[] o) throws HiveException {
  ...
for (int i = 0; i < numCols; ++i) {
if (retCols[i] == null) {
  retCols[i] = cols[i]; // use the object pool rather than creating a 
new object
}
Object extractObject = ((Map)jsonObj).get(paths[i]);
if (extractObject instanceof Map || extractObject instanceof List) {
  retCols[i].set(MAPPER.writeValueAsString(extractObject));
} else if (extractObject != null) {
  retCols[i].set(extractObject.toString());
} else {
  retCols[i] = null;
}
  }
  forward(retCols);
  return;
} catch (Throwable e) {  <= Yikes.
  LOG.error("JSON parsing/evaluation exception" + e);
  forward(nullCols);
}
  }
{code}

The error-handling here seems suspect. Judging from the error message, the 
intention here seems to be to catch JSON-specific errors arising from 
{{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
{{Throwable}}, this code masks the errors that arise from the call to 
{{forward(retCols)}}.

I just ran into this in production. A user with a nearly exhausted HDFS quota 
attempted to use {{json_tuple}} to extract fields from json strings in his 
data. The data turned out to have large record counts and the query used over 
25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
exhausted quota. But the thrown exception was swallowed in the code above. 
{{process()}} ignored the failure for the record and proceeded to the next one. 
Eventually, this resulted in DDoS-ing the name-node.

I'll have a patch for this shortly.

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Reporter: Mithun Radhakrishnan
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11475) Bad rename of directory during commit, when using HCat dynamic-partitioning.

2016-12-19 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11475:

Status: Open  (was: Patch Available)

> Bad rename of directory during commit, when using HCat dynamic-partitioning.
> 
>
> Key: HIVE-11475
> URL: https://issues.apache.org/jira/browse/HIVE-11475
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Critical
> Attachments: HIVE-11475.1.patch
>
>
> Here's one that [~knoguchi] found and root-caused. This one's a doozy. 
> Under seemingly random conditions, the temporary output (under 
> {{_SCRATCH1.234*}}) for HCat's dynamic partitioner isn't promoted correctly 
> to the final table directory.
> The namenode logs indicated a botched directory-rename:
> {noformat}
> 2015-08-02 03:24:29,090 INFO FSNamesystem.audit: allowed=true ugi=myth 
> (auth:TOKEN) via wrkf...@grid.myth.net (auth:TOKEN) ip=/10.192.100.117 
> cmd=rename 
> src=/projects/hive/myth.db/myth_table_15m/_SCRATCH2.8772158158263395E-4/tc=1/utc_time=201508020145/part-r-0
>  
> dst=/projects/hive/myth.db/myth_table_15mE-4/tc=1/utc_time=201508020145/part-r-0
>  perm=myth:madcaps:rw-r-r- proto=rpc
> {noformat}
> Note that the table-directory name {{"myth_table_15m"}} is appended with 
> {{"E-4"}}. This'll break anything that uses HDFS-based polling.
> [~knoguchi] points out the following code:
> {code:title=HCatOutputFormat.java}
> 119   if ((idHash = conf.get(HCatConstants.HCAT_OUTPUT_ID_HASH)) == null) {
> 120 idHash = String.valueOf(Math.random());
> 121   }
> {code}
> {code:title=FileOutputCommitterContainer.java}
> 370   String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + 
> SCRATCH_DIR_NAME + "\\d\\.?\\d+","");
> {code}
> The problem is that when {{Math.random()}} produces a number <= 10 ^-3^, 
> {{String.valueOf(double)}} uses exponential notation. The regex doesn't 
> capture or handle this notation.
> The fix belies the debugging-effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

< 1 2 3 4 5 6 7 >

401 - 500 of 608 matches

Mail list logo