[jira] [Commented] (HIVE-12325) Turn hive.map.groupby.sorted on by default

2015-11-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995113#comment-14995113
 ] 

Lefty Leverenz commented on HIVE-12325:
---

Doc note:  This changes the default value of *hive.map.groupby.sorted* and 
removes *hive.map.groupby.sorted.testmode* from HiveConf.java.  Neither is 
documented yet in the wiki (except for their inclusion in the 0.13 default for 
*hive.security.authorization.sqlstd.confwhitelist*), so both need to be 
documented with version information about these changes.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

*hive.map.groupby.sorted* was introduced by HIVE-3432 in 0.10.0 and 
*hive.map.groupby.sorted.testmode* was introduced by HIVE-4281 in 0.11.0.

> Turn hive.map.groupby.sorted on by default
> --
>
> Key: HIVE-12325
> URL: https://issues.apache.org/jira/browse/HIVE-12325
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Chetna Chaudhari
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12325.1.patch
>
>
> When applicable it can avoid shuffle phase altogether for group by, which 
> will be a performance win. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11105) NegativeArraySizeException from org.apache.hadoop.io.BytesWritable.setCapacity during serialization phase

2015-11-07 Thread Priyesh Raj (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995132#comment-14995132
 ] 

Priyesh Raj commented on HIVE-11105:


Yes, this can happen with any particular version, which causes the integer math 
overflow.

> NegativeArraySizeException from 
> org.apache.hadoop.io.BytesWritable.setCapacity during serialization phase
> -
>
> Key: HIVE-11105
> URL: https://issues.apache.org/jira/browse/HIVE-11105
> Project: Hive
>  Issue Type: Bug
>Reporter: Priyesh Raj
>
> I am getting the exception while running a query on very large data set. The 
> issue is coming in Hive, however my understanding is it's a hadoop 
> setCapacity function problem. The variable definition is integer and it is 
> not able to handle such a large count.
> Please look into it.
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138)
>   ... 13 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:336)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082)
>   ... 14 more
> Caused by: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>   at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>   at org.apache.hadoop.io.BytesWritable.set(BytesWritable.java:171)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:456)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:316)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12354) MapJoin with double keys is slow on MR

2015-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995124#comment-14995124
 ] 

Hive QA commented on HIVE-12354:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12771133/HIVE-12354.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9777 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5960/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5960/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5960/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12771133 - PreCommit-HIVE-TRUNK-Build

> MapJoin with double keys is slow on MR
> --
>
> Key: HIVE-12354
> URL: https://issues.apache.org/jira/browse/HIVE-12354
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12354.01.patch, HIVE-12354.03.patch, 
> HIVE-12354.patch
>
>
> Double keys are also a common type when comparing numbers with strings, and 
> such, so it happens more often than one would expect. This is due to 
> HADOOP-12217



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation

2015-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995141#comment-14995141
 ] 

Hive QA commented on HIVE-0:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12771165/HIVE-0.21.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 9777 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_special_character_in_tabnames_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_empty
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_identity_project_remove_skip
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5961/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5961/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5961/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 32 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12771165 - PreCommit-HIVE-TRUNK-Build

> Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, 
> improve Filter selectivity estimation
> 
>
> Key: HIVE-0
> URL: https://issues.apache.org/jira/browse/HIVE-0
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-0-10.patch, HIVE-0-11.patch, 
> HIVE-0-12.patch, HIVE-0-branch-1.2.patch, HIVE-0.1.patch, 
> HIVE-0.13.patch, HIVE-0.14.patch, HIVE-0.15.patch, 
> HIVE-0.16.patch, HIVE-0.17.patch, HIVE-0.18.patch, 
> HIVE-0.19.patch, HIVE-0.2.patch, HIVE-0.20.patch, 
> HIVE-0.21.patch, HIVE-0.4.patch, HIVE-0.5.patch, 
> HIVE-0.6.patch, HIVE-0.7.patch, HIVE-0.8.patch, 
> HIVE-0.9.patch, HIVE-0.91.patch, HIVE-0.92.patch, HIVE-0.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>  ,store_returns
>  ,date_dim d1
>  ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = ss_sold_date_sk
>

[jira] [Updated] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-12365:
---
Attachment: HIVE-12365.patch

The path of a nonexistent resource should not be included in the value to job 
config such as tmpjars, instead of currently an empty string.

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12311) explain CTAS fails if the table already exists

2015-11-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995363#comment-14995363
 ] 

Ashutosh Chauhan commented on HIVE-12311:
-

+1

> explain CTAS fails if the table already exists
> --
>
> Key: HIVE-12311
> URL: https://issues.apache.org/jira/browse/HIVE-12311
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
>Priority: Minor
> Attachments: HIVE-12311.1.patch
>
>
> Explain of a CTAS will fail if the table already exists.
> This is an annoyance when you're seeing if a large body of SQL queries will 
> function by putting explain in front of every query. 
> {code}
> hive> create table temp (x int);
> OK
> Time taken: 0.252 seconds
> hive> create table temp2 (x int);
> OK
> Time taken: 0.407 seconds
> hive> explain create table temp as select * from temp2;
> FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: 
> Table already exists: mydb.temp
> {code}
> If we compare to Postgres "The Zinc Standard of SQL Compliance":
> {code}
> carter=# create table temp (x int);
> CREATE TABLE
> carter=# create table temp2 (x int);
> CREATE TABLE
> carter=# explain create table temp as select * from temp2;
>QUERY PLAN
> -
>  Seq Scan on temp2  (cost=0.00..34.00 rows=2400 width=4)
> (1 row)
> {code}
> If the CTAS is something complex it would be nice to see the query plan in 
> advance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995385#comment-14995385
 ] 

Chaoyu Tang commented on HIVE-12365:


I was not able to reproduce TestJdbcWithMiniHS2.testAddJarDataNucleusUnCaching 
failure in my local machine. Other three are existing failures. Reattach the 
patch to see if it could be reproduced.

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-12365:
---
Attachment: HIVE-12365.patch

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch, HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12346) Internally used variables in HiveConf should not be settable via command

2015-11-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995281#comment-14995281
 ] 

Chaoyu Tang commented on HIVE-12346:


[~leftylev] I have updated the wiki, please take a look. Thanks.

> Internally used variables in HiveConf should not be settable via command
> 
>
> Key: HIVE-12346
> URL: https://issues.apache.org/jira/browse/HIVE-12346
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12346.1.patch, HIVE-12346.patch
>
>
> Some HiveConf variables such as hive.added.jars.path are only for internal 
> use and should not be settable via set command. 
> We saw a lot of cases that users mistakenly set these variables using set 
> command despite some of them have been documented as "internal parameter" in 
> Hive. The command usually succeeds but it sometimes does not effect, which 
> causes some confusions. For example, the hive.added.jars.path can be set via 
> set command but it is sometimes overridden by session resource jars during 
> runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995336#comment-14995336
 ] 

Hive QA commented on HIVE-12365:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12771186/HIVE-12365.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9778 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarDataNucleusUnCaching
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5962/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5962/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5962/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12771186 - PreCommit-HIVE-TRUNK-Build

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12358) Categorize vectorization benchmarks into arithmetic, comparison, logic

2015-11-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995351#comment-14995351
 ] 

Ashutosh Chauhan commented on HIVE-12358:
-

+1

> Categorize vectorization benchmarks into arithmetic, comparison, logic
> --
>
> Key: HIVE-12358
> URL: https://issues.apache.org/jira/browse/HIVE-12358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
> Attachments: HIVE-12358.patch
>
>
> There are 30+ vectorization benchmarks in VectorizationBench.java file with 
> 500+ lines. They need to be grouped by categories into arithmetic, logic, 
> comparison for ease of management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12017) Do not disable CBO by default when number of joins in a query is equal or less than 1

2015-11-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995367#comment-14995367
 ] 

Ashutosh Chauhan commented on HIVE-12017:
-

[~jcamachorodriguez] Can you create a RB entry for this?

> Do not disable CBO by default when number of joins in a query is equal or 
> less than 1
> -
>
> Key: HIVE-12017
> URL: https://issues.apache.org/jira/browse/HIVE-12017
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12017.01.patch, HIVE-12017.02.patch, 
> HIVE-12017.03.patch, HIVE-12017.04.patch, HIVE-12017.05.patch, 
> HIVE-12017.06.patch, HIVE-12017.07.patch, HIVE-12017.08.patch
>
>
> Instead, we could disable some parts of CBO that are not relevant if the 
> query contains 1 or 0 joins. Implementation should be able to define easily 
> other query patterns for which we might disable some parts of CBO (in case we 
> want to do it in the future).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995428#comment-14995428
 ] 

Hive QA commented on HIVE-12365:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12771202/HIVE-12365.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9778 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5963/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5963/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5963/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12771202 - PreCommit-HIVE-TRUNK-Build

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch, HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11948) Investigate TxnHandler and CompactionTxnHandler to see where we can reduce transaction isolation level

2015-11-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11948:
--
Attachment: HIVE-11948.patch

> Investigate TxnHandler and CompactionTxnHandler to see where we can reduce 
> transaction isolation level
> --
>
> Key: HIVE-11948
> URL: https://issues.apache.org/jira/browse/HIVE-11948
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11948.patch
>
>
> at least some operations (or parts of operations) can run at READ_COMMITTED.
> CompactionTxnHandler.setRunAs()
> CompactionTxnHandler.findNextToCompact()
> if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
> and logic to look for "next" candidate
> CompactionTxnHandler.markCompacted()
> perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
> consistency check)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995446#comment-14995446
 ] 

Chaoyu Tang commented on HIVE-12365:


testAddJarDataNucleusUnCaching failure could not be reproduced. The three 
failures are not related to this patch.

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch, HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12346) Internally used variables in HiveConf should not be settable via command

2015-11-07 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-12346:
--
Labels:   (was: TODOC1.3)

> Internally used variables in HiveConf should not be settable via command
> 
>
> Key: HIVE-12346
> URL: https://issues.apache.org/jira/browse/HIVE-12346
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12346.1.patch, HIVE-12346.patch
>
>
> Some HiveConf variables such as hive.added.jars.path are only for internal 
> use and should not be settable via set command. 
> We saw a lot of cases that users mistakenly set these variables using set 
> command despite some of them have been documented as "internal parameter" in 
> Hive. The command usually succeeds but it sometimes does not effect, which 
> causes some confusions. For example, the hive.added.jars.path can be set via 
> set command but it is sometimes overridden by session resource jars during 
> runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11948) Investigate TxnHandler and CompactionTxnHandler to see where we improve concurrency

2015-11-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995462#comment-14995462
 ] 

Eugene Koifman commented on HIVE-11948:
---

[~alangates] could you take a look please?

> Investigate TxnHandler and CompactionTxnHandler to see where we improve 
> concurrency
> ---
>
> Key: HIVE-11948
> URL: https://issues.apache.org/jira/browse/HIVE-11948
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11948.patch
>
>
> at least some operations (or parts of operations) can run at READ_COMMITTED.
> CompactionTxnHandler.setRunAs()
> CompactionTxnHandler.findNextToCompact()
> if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
> and logic to look for "next" candidate
> CompactionTxnHandler.markCompacted()
> perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
> consistency check)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11948) Investigate TxnHandler and CompactionTxnHandler to see where we improve concurrency

2015-11-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11948:
--
Summary: Investigate TxnHandler and CompactionTxnHandler to see where we 
improve concurrency  (was: Investigate TxnHandler and CompactionTxnHandler to 
see where we can reduce transaction isolation level)

> Investigate TxnHandler and CompactionTxnHandler to see where we improve 
> concurrency
> ---
>
> Key: HIVE-11948
> URL: https://issues.apache.org/jira/browse/HIVE-11948
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11948.patch
>
>
> at least some operations (or parts of operations) can run at READ_COMMITTED.
> CompactionTxnHandler.setRunAs()
> CompactionTxnHandler.findNextToCompact()
> if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
> and logic to look for "next" candidate
> CompactionTxnHandler.markCompacted()
> perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
> consistency check)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11948) Investigate TxnHandler and CompactionTxnHandler to see where we improve concurrency

2015-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995480#comment-14995480
 ] 

Hive QA commented on HIVE-11948:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12771209/HIVE-11948.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9777 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_corr
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5964/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5964/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5964/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12771209 - PreCommit-HIVE-TRUNK-Build

> Investigate TxnHandler and CompactionTxnHandler to see where we improve 
> concurrency
> ---
>
> Key: HIVE-11948
> URL: https://issues.apache.org/jira/browse/HIVE-11948
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11948.patch
>
>
> at least some operations (or parts of operations) can run at READ_COMMITTED.
> CompactionTxnHandler.setRunAs()
> CompactionTxnHandler.findNextToCompact()
> if update stmt includes cq_state = '" + INITIATED_STATE + "'" in WHERE clause 
> and logic to look for "next" candidate
> CompactionTxnHandler.markCompacted()
> perhaps add cq_state=WORKING_STATE in Where clause (mostly as an extra 
> consistency check)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12346) Internally used variables in HiveConf should not be settable via command

2015-11-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995489#comment-14995489
 ] 

Chaoyu Tang commented on HIVE-12346:


Thanks, [~leftylev]

> Internally used variables in HiveConf should not be settable via command
> 
>
> Key: HIVE-12346
> URL: https://issues.apache.org/jira/browse/HIVE-12346
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12346.1.patch, HIVE-12346.patch
>
>
> Some HiveConf variables such as hive.added.jars.path are only for internal 
> use and should not be settable via set command. 
> We saw a lot of cases that users mistakenly set these variables using set 
> command despite some of them have been documented as "internal parameter" in 
> Hive. The command usually succeeds but it sometimes does not effect, which 
> causes some confusions. For example, the hive.added.jars.path can be set via 
> set command but it is sometimes overridden by session resource jars during 
> runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995497#comment-14995497
 ] 

Xuefu Zhang commented on HIVE-12365:


+1

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch, HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-11-07 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995540#comment-14995540
 ] 

Ratandeep Ratti commented on HIVE-11878:


Hi [~jdere]
  Thanks for the comments. We can change the parent classloader when creating 
the {{UDFClassLoader}} to SessionState.class.getClassLoader() which will be the 
system classloader.  What do you think?

About the second point. I think you are right. I was mistaken.

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)