[jira] [Assigned] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-23 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-11051:
---

Assignee: Matt McCline  (was: Gopal V)

> Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
> [Ljava.lang.Object;
> -
>
> Key: HIVE-11051
> URL: https://issues.apache.org/jira/browse/HIVE-11051
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers, Tez
>Affects Versions: 1.2.0
>Reporter: Greg Senia
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11051.01.patch, problem_table_joins.tar.gz
>
>
> I tried to apply: HIVE-10729 which did not solve the issue.
> The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
> 0.5.4/0.5.3
> {code}
> Status: Running (Executing on YARN cluster with App id 
> application_1434641270368_1038)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  3  300   0  
>  0
> Map 2 ... FAILED  3  102   7  
>  0
> 
> VERTICES: 01/02  [=>>-] 66%   ELAPSED TIME: 7.39 s
>  
> 
> Status: Failed
> Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
> diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
> {"cnctevn_id":"002245282386","svcrqst_id":"003627217285","svcrqst_crt_dts":"2015-04-23
>  11:54:39.238357","subject_seq_no":1,"plan_component":"HMOM1 
> ","cust_segment":"RM 
> ","cnctyp_cd":"001","cnctmd_cd":"D02","cnctevs_cd":"007","svcrtyp_cd":"335","svrstyp_cd":"088","cmpltyp_cd":"
>  ","catsrsn_cd":"","apealvl_cd":" 
> ","cnstnty_cd":"001","svcrqst_asrqst_ind":"Y","svcrqst_rtnorig_in":"N","svcrqst_vwasof_dt":"null","sum_reason_cd":"98","sum_reason":"Exclude","crsr_master_claim_index":null,"svcrqct_cds":["
>"],"svcrqst_lupdt":"2015-04-23 
> 22:14:01.288132","crsr_lupdt":null,"cntevsds_lupdt":"2015-04-23 
> 11:54:40.740061","ignore_me":1,"notes":null}
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
> {"cnctevn_id":"002245282386","svcrqst_id":"003627217285","svcrqst_crt_dts":"2015-04-23
>  11:54:39.238357","subject_seq_no":1,"plan_component":"HMOM1 
> ","cust_segment":"RM 
> ","cnctyp_cd":"001","cnctmd_cd":"D02","cnctevs_cd":"007","svcrtyp_cd":"335","svrstyp_cd":"088","cmpltyp_cd":"
>  ","catsrsn_cd":"","apealvl_cd":" 
> ","cnstnty_cd":"001","svcrqst_asrqst_ind":"Y","svcrqst_rtnorig_in":"N","svcrqst_vwasof_dt":"null","sum_reason_cd":"98","sum_reason":"Exclude","crsr_master_claim_index":null,"svcrqct_cds":["
>"],"svcrqst_lupdt":"2015-04-23 
> 22:14:01.288132","crsr_lupdt":null,"cntevsds_lupdt":"2015-04-23 
> 11:54:40.740061","ignore_me":1,"notes":null}
> at 
> org.apache.ha

[jira] [Updated] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-23 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11051:

Attachment: HIVE-11051.01.patch

> Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
> [Ljava.lang.Object;
> -
>
> Key: HIVE-11051
> URL: https://issues.apache.org/jira/browse/HIVE-11051
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers, Tez
>Affects Versions: 1.2.0
>Reporter: Greg Senia
>Assignee: Gopal V
>Priority: Critical
> Attachments: HIVE-11051.01.patch, problem_table_joins.tar.gz
>
>
> I tried to apply: HIVE-10729 which did not solve the issue.
> The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
> 0.5.4/0.5.3
> {code}
> Status: Running (Executing on YARN cluster with App id 
> application_1434641270368_1038)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  3  300   0  
>  0
> Map 2 ... FAILED  3  102   7  
>  0
> 
> VERTICES: 01/02  [=>>-] 66%   ELAPSED TIME: 7.39 s
>  
> 
> Status: Failed
> Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
> diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
> {"cnctevn_id":"002245282386","svcrqst_id":"003627217285","svcrqst_crt_dts":"2015-04-23
>  11:54:39.238357","subject_seq_no":1,"plan_component":"HMOM1 
> ","cust_segment":"RM 
> ","cnctyp_cd":"001","cnctmd_cd":"D02","cnctevs_cd":"007","svcrtyp_cd":"335","svrstyp_cd":"088","cmpltyp_cd":"
>  ","catsrsn_cd":"","apealvl_cd":" 
> ","cnstnty_cd":"001","svcrqst_asrqst_ind":"Y","svcrqst_rtnorig_in":"N","svcrqst_vwasof_dt":"null","sum_reason_cd":"98","sum_reason":"Exclude","crsr_master_claim_index":null,"svcrqct_cds":["
>"],"svcrqst_lupdt":"2015-04-23 
> 22:14:01.288132","crsr_lupdt":null,"cntevsds_lupdt":"2015-04-23 
> 11:54:40.740061","ignore_me":1,"notes":null}
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
> {"cnctevn_id":"002245282386","svcrqst_id":"003627217285","svcrqst_crt_dts":"2015-04-23
>  11:54:39.238357","subject_seq_no":1,"plan_component":"HMOM1 
> ","cust_segment":"RM 
> ","cnctyp_cd":"001","cnctmd_cd":"D02","cnctevs_cd":"007","svcrtyp_cd":"335","svrstyp_cd":"088","cmpltyp_cd":"
>  ","catsrsn_cd":"","apealvl_cd":" 
> ","cnstnty_cd":"001","svcrqst_asrqst_ind":"Y","svcrqst_rtnorig_in":"N","svcrqst_vwasof_dt":"null","sum_reason_cd":"98","sum_reason":"Exclude","crsr_master_claim_index":null,"svcrqct_cds":["
>"],"svcrqst_lupdt":"2015-04-23 
> 22:14:01.288132","crsr_lupdt":null,"cntevsds_lupdt":"2015-04-23 
> 11:54:40.740061","ignore_me":1,"notes":null}
> at 
> org.apache.hadoop.hive.ql.exec.t

[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-23 Thread Nishant Kelkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598962#comment-14598962
 ] 

Nishant Kelkar commented on HIVE-9557:
--

I can volunteer for creating a patch for this task (this would be my first 
patch ever!). If someone could point me to the place where I am to create this 
class and it's tests, I could upload a patch. Thank you!

> create UDF to measure strings similarity using Cosine Similarity algo
> -
>
> Key: HIVE-9557
> URL: https://issues.apache.org/jira/browse/HIVE-9557
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>
> algo description http://en.wikipedia.org/wiki/Cosine_similarity
> {code}
> --one word different, total 2 words
> str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
> {code}
> reference implementation:
> https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-06-23 Thread Zhang Jingpeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598949#comment-14598949
 ] 

Zhang Jingpeng commented on HIVE-7292:
--

Is the branch already usable in production?

> Hive on Spark
> -
>
> Key: HIVE-7292
> URL: https://issues.apache.org/jira/browse/HIVE-7292
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
> Attachments: Hive-on-Spark.pdf
>
>
> Spark as an open-source data analytics cluster computing framework has gained 
> significant momentum recently. Many Hive users already have Spark installed 
> as their computing backbone. To take advantages of Hive, they still need to 
> have either MapReduce or Tez on their cluster. This initiative will provide 
> user a new alternative so that those user can consolidate their backend. 
> Secondly, providing such an alternative further increases Hive's adoption as 
> it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
> on Hadoop.
> Finally, allowing Hive to run on Spark also has performance benefits. Hive 
> queries, especially those involving multiple reducer stages, will run faster, 
> thus improving user experience as Tez does.
> This is an umbrella JIRA which will cover many coming subtask. Design doc 
> will be attached here shortly, and will be on the wiki as well. Feedback from 
> the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-06-23 Thread Zhang Jingpeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598948#comment-14598948
 ] 

Zhang Jingpeng commented on HIVE-7292:
--

Is the branch already usable in production?

> Hive on Spark
> -
>
> Key: HIVE-7292
> URL: https://issues.apache.org/jira/browse/HIVE-7292
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
> Attachments: Hive-on-Spark.pdf
>
>
> Spark as an open-source data analytics cluster computing framework has gained 
> significant momentum recently. Many Hive users already have Spark installed 
> as their computing backbone. To take advantages of Hive, they still need to 
> have either MapReduce or Tez on their cluster. This initiative will provide 
> user a new alternative so that those user can consolidate their backend. 
> Secondly, providing such an alternative further increases Hive's adoption as 
> it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
> on Hadoop.
> Finally, allowing Hive to run on Spark also has performance benefits. Hive 
> queries, especially those involving multiple reducer stages, will run faster, 
> thus improving user experience as Tez does.
> This is an umbrella JIRA which will cover many coming subtask. Design doc 
> will be attached here shortly, and will be on the wiki as well. Feedback from 
> the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-06-23 Thread Zhang Jingpeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598947#comment-14598947
 ] 

Zhang Jingpeng commented on HIVE-7292:
--

Is the branch already usable in production?

> Hive on Spark
> -
>
> Key: HIVE-7292
> URL: https://issues.apache.org/jira/browse/HIVE-7292
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
> Attachments: Hive-on-Spark.pdf
>
>
> Spark as an open-source data analytics cluster computing framework has gained 
> significant momentum recently. Many Hive users already have Spark installed 
> as their computing backbone. To take advantages of Hive, they still need to 
> have either MapReduce or Tez on their cluster. This initiative will provide 
> user a new alternative so that those user can consolidate their backend. 
> Secondly, providing such an alternative further increases Hive's adoption as 
> it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
> on Hadoop.
> Finally, allowing Hive to run on Spark also has performance benefits. Hive 
> queries, especially those involving multiple reducer stages, will run faster, 
> thus improving user experience as Tez does.
> This is an umbrella JIRA which will cover many coming subtask. Design doc 
> will be attached here shortly, and will be on the wiki as well. Feedback from 
> the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598935#comment-14598935
 ] 

Chengxiang Li commented on HIVE-10999:
--

The classpath update code change looks good to me, i'm +1 on this patch.

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-23 Thread Nishant Kelkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598936#comment-14598936
 ] 

Nishant Kelkar commented on HIVE-9557:
--

[~apivovarov]: The reference implementation link you've provided seems to be 
broken. Did you mean to point here? -- 
https://github.com/Simmetrics/simmetrics/blob/master/simmetrics-core/src/main/java/org/simmetrics/metrics/CosineSimilarity.java

> create UDF to measure strings similarity using Cosine Similarity algo
> -
>
> Key: HIVE-9557
> URL: https://issues.apache.org/jira/browse/HIVE-9557
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>
> algo description http://en.wikipedia.org/wiki/Cosine_similarity
> {code}
> --one word different, total 2 words
> str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
> {code}
> reference implementation:
> https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598894#comment-14598894
 ] 

Hive QA commented on HIVE-11079:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741433/HIVE-11079.5.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9019 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4360/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4360/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4360/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741433 - PreCommit-HIVE-TRUNK-Build

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, 
> HIVE-11079.3.patch, HIVE-11079.4.patch, HIVE-11079.5.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598893#comment-14598893
 ] 

Gopal V commented on HIVE-11043:


[~prasanth_j]: sure, looks like errors when reading footers for the 1 file/1 
split case.

The error is actually

{code}
Caused by: java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3212)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

> ORC split strategies should adapt based on number of files
> --
>
> Key: HIVE-11043
> URL: https://issues.apache.org/jira/browse/HIVE-11043
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Gopal V
> Fix For: 2.0.0
>
> Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI

2015-06-23 Thread Adam Kunicki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kunicki updated HIVE-11089:

Description: 
HIVE-7508 "Add Kerberos Support" seems to also remove the ability to specify a 
proxy user.

HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
connection is supposed to be a secure connection.

This however breaks support for Proxy Users as a proxy user UGI will always 
return false to hasKerberosCredentials().

See lines 273, 274 of HiveEndPoint.java
{code}
this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
{code}

It also seems that between 13.1 and 0.14 the newConnection() method that 
includes a proxy user has been removed.

for reference: 
https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a

  was:
HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
connection is supposed to be a secure connection.

This however breaks support for Proxy Users as a proxy user UGI will always 
return false to hasKerberosCredentials().

See lines 273, 274 of HiveEndPoint.java
{code}
this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
{code}

It also seems that between 13.1 and 0.14 the newConnection() method that 
includes a proxy user has been removed.

for reference: 
https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a


> Hive Streaming: connection fails when using a proxy user UGI
> 
>
> Key: HIVE-11089
> URL: https://issues.apache.org/jira/browse/HIVE-11089
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Adam Kunicki
>  Labels: ACID, Streaming
>
> HIVE-7508 "Add Kerberos Support" seems to also remove the ability to specify 
> a proxy user.
> HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
> connection is supposed to be a secure connection.
> This however breaks support for Proxy Users as a proxy user UGI will always 
> return false to hasKerberosCredentials().
> See lines 273, 274 of HiveEndPoint.java
> {code}
> this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
> this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
> {code}
> It also seems that between 13.1 and 0.14 the newConnection() method that 
> includes a proxy user has been removed.
> for reference: 
> https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI

2015-06-23 Thread Adam Kunicki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kunicki updated HIVE-11089:

Description: 
HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
connection is supposed to be a secure connection.

This however breaks support for Proxy Users as a proxy user UGI will always 
return false to hasKerberosCredentials().

See lines 273, 274 of HiveEndPoint.java
{code}
this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
{code}

It also seems that between 13.1 and 0.14 the newConnection() method that 
includes a proxy user has been removed.

for reference: 
https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a

  was:
HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
connection is supposed to be a secure connection.

This however breaks support for Proxy Users as a proxy user UGI will always 
return false to hasKerberosCredentials().

If the goal is to determine whether this is a secure cluster, we could instead 
call:
{code}
this.secureMode = ugi == null ? ugi.getRealAuthenticationMethod() != SIMPLE
{code}

This change would for both proxy users and real users.

See lines 273, 274 of HiveEndPoint.java
{code}
this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
{code}

for reference: 
https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a


> Hive Streaming: connection fails when using a proxy user UGI
> 
>
> Key: HIVE-11089
> URL: https://issues.apache.org/jira/browse/HIVE-11089
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Adam Kunicki
>  Labels: ACID, Streaming
>
> HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
> connection is supposed to be a secure connection.
> This however breaks support for Proxy Users as a proxy user UGI will always 
> return false to hasKerberosCredentials().
> See lines 273, 274 of HiveEndPoint.java
> {code}
> this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
> this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
> {code}
> It also seems that between 13.1 and 0.14 the newConnection() method that 
> includes a proxy user has been removed.
> for reference: 
> https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598838#comment-14598838
 ] 

Hive QA commented on HIVE-10895:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741385/HIVE-10895.2.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9020 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4
org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4359/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4359/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4359/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741385 - PreCommit-HIVE-TRUNK-Build

> ObjectStore does not close Query objects in some calls, causing a potential 
> leak in some metastore db resources
> ---
>
> Key: HIVE-10895
> URL: https://issues.apache.org/jira/browse/HIVE-10895
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Takahiko Saito
>Assignee: Aihua Xu
> Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch
>
>
> During testing, we've noticed Oracle db running out of cursors. Might be 
> related to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI

2015-06-23 Thread Adam Kunicki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kunicki updated HIVE-11089:

Description: 
HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
connection is supposed to be a secure connection.

This however breaks support for Proxy Users as a proxy user UGI will always 
return false to hasKerberosCredentials().

If the goal is to determine whether this is a secure cluster, we could instead 
call:
{code}
this.secureMode = ugi == null ? ugi.getRealAuthenticationMethod() != SIMPLE
{code}

This change would for both proxy users and real users.

See lines 273, 274 of HiveEndPoint.java
{code}
this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
{code}

for reference: 
https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a

  was:
HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
connection is supposed to be a secure connection.

This however breaks support for Proxy Users as a proxy user UGI will always 
return false to hasKerberosCredentials().


> Hive Streaming: connection fails when using a proxy user UGI
> 
>
> Key: HIVE-11089
> URL: https://issues.apache.org/jira/browse/HIVE-11089
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Adam Kunicki
>  Labels: ACID, Streaming
>
> HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
> connection is supposed to be a secure connection.
> This however breaks support for Proxy Users as a proxy user UGI will always 
> return false to hasKerberosCredentials().
> If the goal is to determine whether this is a secure cluster, we could 
> instead call:
> {code}
> this.secureMode = ugi == null ? ugi.getRealAuthenticationMethod() != SIMPLE
> {code}
> This change would for both proxy users and real users.
> See lines 273, 274 of HiveEndPoint.java
> {code}
> this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
> this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
> {code}
> for reference: 
> https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-23 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reopened HIVE-11043:
--

[~pxiong] My bad. From the looks of the the test failures seemed unrelated. I 
reverted the patch on branch-1 and master. [~gopalv] Can you look at the test 
failures?

> ORC split strategies should adapt based on number of files
> --
>
> Key: HIVE-11043
> URL: https://issues.apache.org/jira/browse/HIVE-11043
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Gopal V
> Fix For: 2.0.0
>
> Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-23 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598829#comment-14598829
 ] 

Prasanth Jayachandran edited comment on HIVE-11043 at 6/24/15 3:54 AM:
---

[~pxiong] My bad. From the looks of it, the test failures seemed unrelated. I 
reverted the patch on branch-1 and master. [~gopalv] Can you look at the test 
failures?


was (Author: prasanth_j):
[~pxiong] My bad. From the looks of the the test failures seemed unrelated. I 
reverted the patch on branch-1 and master. [~gopalv] Can you look at the test 
failures?

> ORC split strategies should adapt based on number of files
> --
>
> Key: HIVE-11043
> URL: https://issues.apache.org/jira/browse/HIVE-11043
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Gopal V
> Fix For: 2.0.0
>
> Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI

2015-06-23 Thread Adam Kunicki (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598822#comment-14598822
 ] 

Adam Kunicki commented on HIVE-11089:
-

Additionally, it seems that the Hive docs 
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest reflect 
the HiveEndPoint API prior to HIVE-8427 which does allow for specifying a proxy 
user.

> Hive Streaming: connection fails when using a proxy user UGI
> 
>
> Key: HIVE-11089
> URL: https://issues.apache.org/jira/browse/HIVE-11089
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Adam Kunicki
>  Labels: ACID, Streaming
>
> HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
> connection is supposed to be a secure connection.
> This however breaks support for Proxy Users as a proxy user UGI will always 
> return false to hasKerberosCredentials().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-23 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598804#comment-14598804
 ] 

Greg Senia commented on HIVE-10729:
---

Gunther Hagleitner and Matt Mcline Using this Patch against my JIRA HIVE-11051 
and the test case on Hadoop 2.4.1 with Hive 1.2.0 and Tez 0.5.4 it still fails:

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{"cnctevn_id":"002246948195","svcrqst_id":"003629537980","svcrqst_crt_dts":"2015-04-24
 12:48:37.859683","subject_seq_no":1,"plan_component":"HMOM1 
","cust_segment":"RM 
","cnctyp_cd":"001","cnctmd_cd":"D02","cnctevs_cd":"007","svcrtyp_cd":"335","svrstyp_cd":"088","cmpltyp_cd":"
 ","catsrsn_cd":"","apealvl_cd":" 
","cnstnty_cd":"001","svcrqst_asrqst_ind":"Y","svcrqst_rtnorig_in":"N","svcrqst_vwasof_dt":"null","sum_reason_cd":"98","sum_reason":"Exclude","crsr_master_claim_index":null,"svcrqct_cds":["
   "],"svcrqst_lupdt":"2015-04-24 
12:48:37.859683","crsr_lupdt":null,"cntevsds_lupdt":"2015-04-24 
12:48:40.499238","ignore_me":1,"notes":null}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"cnctevn_id":"002246948195","svcrqst_id":"003629537980","svcrqst_crt_dts":"2015-04-24
 12:48:37.859683","subject_seq_no":1,"plan_component":"HMOM1 
","cust_segment":"RM 
","cnctyp_cd":"001","cnctmd_cd":"D02","cnctevs_cd":"007","svcrtyp_cd":"335","svrstyp_cd":"088","cmpltyp_cd":"
 ","catsrsn_cd":"","apealvl_cd":" 
","cnstnty_cd":"001","svcrqst_asrqst_ind":"Y","svcrqst_rtnorig_in":"N","svcrqst_vwasof_dt":"null","sum_reason_cd":"98","sum_reason":"Exclude","crsr_master_claim_index":null,"svcrqct_cds":["
   "],"svcrqst_lupdt":"2015-04-24 
12:48:37.859683","crsr_lupdt":null,"cntevsds_lupdt":"2015-04-24 
12:48:40.499238","ignore_me":1,"notes":null}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: Index: 0, Size: 0
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:426)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 17 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.set(ArrayList.java:426)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.fixupComplexObjects(MapJoinBytesTableContainer.java:424)
at 
org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.uppack(HybridHashTableContainer.java:875)
at 
org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:845)
at 
org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:722)
at 
org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:62)
at 
org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:33)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:650)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:756)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:414)
... 23 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
vertex_1434641270368_13820_2_01 [Map 2] killed/failed due to:null]DAG failed 
due to vertex failure. failedVertices:1 killedVertices:0

> Query failed when select complex columns from joinned table (tez map join 
> only)
> ---

[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-23 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598799#comment-14598799
 ] 

Pengcheng Xiong commented on HIVE-11043:


[~prasanth_j] and [~gopalv], [~jpullokkaran] asked me to track the recent 
constant test cases failing on master and I came here. It seems that this patch 
causes the problem. At the first sight, authorization_delete.q sounds 
unrelated. However, it includes creating a table stored as ORC. If I revert 
this patch, the test cases can pass. Could you guys take a look? Thanks.

> ORC split strategies should adapt based on number of files
> --
>
> Key: HIVE-11043
> URL: https://issues.apache.org/jira/browse/HIVE-11043
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Gopal V
> Fix For: 2.0.0
>
> Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-23 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang reopened HIVE-10983:
-

> LazySimpleSerDe bug  ,when Text is reused 
> --
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598766#comment-14598766
 ] 

Xuefu Zhang commented on HIVE-10999:


[~chengxiang li], [~spena] is aware of the problem and investigating it. In the 
mean time, please feel free move this JIRA forward, ignoring that failure. 
Thanks.

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598762#comment-14598762
 ] 

Chengxiang Li commented on HIVE-10999:
--

Does anyone know why we get 
org.apache.hadoop.hive.cli.TestCliDriver.initializationError? If not, we should 
create a JIRA to track this, it block all the TestCliDriver tests.

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11079:
--
Attachment: HIVE-11079.5.patch

Patch needed rebase after HIVE-11037

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, 
> HIVE-11079.3.patch, HIVE-11079.4.patch, HIVE-11079.5.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598760#comment-14598760
 ] 

Hive QA commented on HIVE-10233:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741403/HIVE-10233.13.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9016 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4
org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4358/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4358/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4358/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741403 - PreCommit-HIVE-TRUNK-Build

> Hive on tez: memory manager for grace hash join
> ---
>
> Key: HIVE-10233
> URL: https://issues.apache.org/jira/browse/HIVE-10233
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap, 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Gunther Hagleitner
> Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
> HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
> HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
> HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, 
> HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch
>
>
> We need a memory manager in llap/tez to manage the usage of memory across 
> threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]

2015-06-23 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-11053:
-
Assignee: GAOLUN

> Add more tests for HIVE-10844[Spark Branch]
> ---
>
> Key: HIVE-11053
> URL: https://issues.apache.org/jira/browse/HIVE-11053
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: GAOLUN
>Priority: Minor
>
> Add some test cases for self union, self-join, CWE, and repeated sub-queries 
> to verify the job of combining quivalent works in HIVE-10844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598755#comment-14598755
 ] 

Chengxiang Li commented on HIVE-10999:
--

Seems the latest upload patch pass all the tests, except 
org.apache.hadoop.hive.cli.TestCliDriver.initializationError. :)

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]

2015-06-23 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-11053:
-
Assignee: (was: GAOLUN)

> Add more tests for HIVE-10844[Spark Branch]
> ---
>
> Key: HIVE-11053
> URL: https://issues.apache.org/jira/browse/HIVE-11053
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Priority: Minor
>
> Add some test cases for self union, self-join, CWE, and repeated sub-queries 
> to verify the job of combining quivalent works in HIVE-10844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]

2015-06-23 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-11053:
-
Assignee: GAOLUN

> Add more tests for HIVE-10844[Spark Branch]
> ---
>
> Key: HIVE-11053
> URL: https://issues.apache.org/jira/browse/HIVE-11053
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: GAOLUN
>Priority: Minor
>
> Add some test cases for self union, self-join, CWE, and repeated sub-queries 
> to verify the job of combining quivalent works in HIVE-10844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11079:
--
Attachment: HIVE-11079.4.patch

Actually it looks like a couple of the test fixes are not necessary - I had 2 
different build environments, and one of them had git core.crlf=true which 
caused the files to have Windows style CRLF line endings, which affects some 
tests. In patch v4 I've removed the changes for decimal_udf2.q and 
describe_comment_indent.q.

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, 
> HIVE-11079.3.patch, HIVE-11079.4.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598686#comment-14598686
 ] 

Gunther Hagleitner commented on HIVE-11079:
---

test failures are unrelated (failed on next run as well).

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, 
> HIVE-11079.3.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10233:
--
Attachment: HIVE-10233.14.patch

fix indent in .14

> Hive on tez: memory manager for grace hash join
> ---
>
> Key: HIVE-10233
> URL: https://issues.apache.org/jira/browse/HIVE-10233
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap, 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Gunther Hagleitner
> Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
> HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
> HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
> HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, 
> HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch
>
>
> We need a memory manager in llap/tez to manage the usage of memory across 
> threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11037) HiveOnTez: make explain user level = true as default

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598668#comment-14598668
 ] 

Hive QA commented on HIVE-11037:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741353/HIVE-11037.08.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4357/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4357/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4357/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4357/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 55c6d41 HIVE-10790 : orc write on viewFS throws exception 
(Xioawei Wang via Ashutosh Chauhan)
+ git clean -f -d
Removing serde/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 55c6d41 HIVE-10790 : orc write on viewFS throws exception 
(Xioawei Wang via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741353 - PreCommit-HIVE-TRUNK-Build

> HiveOnTez: make explain user level = true as default
> 
>
> Key: HIVE-11037
> URL: https://issues.apache.org/jira/browse/HIVE-11037
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, 
> HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, 
> HIVE-11037.06.patch, HIVE-11037.07.patch, HIVE-11037.08.patch
>
>
> In Hive-9780, we introduced a new level of explain for hive on tez. We would 
> like to make it running by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10553) Remove hardcoded Parquet references from SearchArgumentImpl

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598665#comment-14598665
 ] 

Hive QA commented on HIVE-10553:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741348/HIVE-10553.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9016 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4
org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4356/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4356/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4356/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741348 - PreCommit-HIVE-TRUNK-Build

> Remove hardcoded Parquet references from SearchArgumentImpl
> ---
>
> Key: HIVE-10553
> URL: https://issues.apache.org/jira/browse/HIVE-10553
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Owen O'Malley
> Attachments: HIVE-10553.patch, HIVE-10553.patch, HIVE-10553.patch
>
>
> SARGs currently depend on Parquet code, which causes a tight coupling between 
> parquet releases and storage-api versions.
> Move Parquet code out to its own RecordReader, similar to ORC's SargApplier 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI

2015-06-23 Thread Adam Kunicki (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598620#comment-14598620
 ] 

Adam Kunicki commented on HIVE-11089:
-

HIVE-8427 introduces a change that causes incorrect behavior when using a proxy 
user with HiveEndPoint.newConnection()

> Hive Streaming: connection fails when using a proxy user UGI
> 
>
> Key: HIVE-11089
> URL: https://issues.apache.org/jira/browse/HIVE-11089
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Adam Kunicki
>  Labels: ACID, Streaming
>
> HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
> connection is supposed to be a secure connection.
> This however breaks support for Proxy Users as a proxy user UGI will always 
> return false to hasKerberosCredentials().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-23 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10233:
--
Attachment: HIVE-10233.13.patch

Fix for unions.

> Hive on tez: memory manager for grace hash join
> ---
>
> Key: HIVE-10233
> URL: https://issues.apache.org/jira/browse/HIVE-10233
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap, 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Gunther Hagleitner
> Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
> HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
> HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
> HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, 
> HIVE-10233.12.patch, HIVE-10233.13.patch
>
>
> We need a memory manager in llap/tez to manage the usage of memory across 
> threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10173) ThreadLocal synchronized initialvalue() is irrelevant in JDK7

2015-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598566#comment-14598566
 ] 

Ashutosh Chauhan commented on HIVE-10173:
-

+1 LGTM

> ThreadLocal synchronized initialvalue() is irrelevant in JDK7
> -
>
> Key: HIVE-10173
> URL: https://issues.apache.org/jira/browse/HIVE-10173
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Ferdinand Xu
>Priority: Minor
> Attachments: HIVE-10173.patch
>
>
> The threadlocals need not synchronize the calls to initialvalue(), since that 
> is effectively going to be called once per-thread in JDK7.
> The anti-pattern lives on due to a very old JDK bug - 
> https://bugs.openjdk.java.net/browse/JDK-6550283
> {code}
> $ git grep --name-only -c "protected.*synchronized.*initialValue"
> common/src/java/org/apache/hadoop/hive/conf/LoopingByteArrayInputStream.java
> contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesInput.java
> contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesOutput.java
> contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesRecordInput.java
> contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesRecordOutput.java
> contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableInput.java
> contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableOutput.java
> metastore/src/java/org/apache/hadoop/hive/metastore/Deadline.java
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
> ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java
> ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java
> serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
> serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java
> service/src/java/org/apache/hive/service/auth/TSetIpAddressProcessor.java
> service/src/java/org/apache/hive/service/cli/session/SessionManager.java
> shims/common/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598558#comment-14598558
 ] 

Hive QA commented on HIVE-11079:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741338/HIVE-11079.2.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9021 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4355/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4355/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4355/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741338 - PreCommit-HIVE-TRUNK-Build

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, 
> HIVE-11079.3.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11079:
--
Attachment: HIVE-11079.3.patch

Patch v3 - updating decimal_udf2.q as well, which has stats/file size 
differences on Windows due to CR differences on text files. Fixing the test by 
changing the table type to ORC rather than text, which should have more 
consistent data size between platforms. The actual stats values look different 
from before, but stats are not really important for this test.

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, 
> HIVE-11079.3.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598460#comment-14598460
 ] 

Gunther Hagleitner commented on HIVE-11079:
---

+1

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10790) orc write on viewFS throws exception

2015-06-23 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10790:

Summary: orc write on viewFS throws exception  (was: orc file sql excute 
fail )

> orc write on viewFS throws exception
> 
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1
>
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table like as 
> {code:sql}
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') 
> select ur,rf,it,dt from custom.rank_text where logdate='2015051500';
> {code}
> will throws a error ,
> {noformat}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10795) Remove use of PerfLogger from Orc

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598436#comment-14598436
 ] 

Hive QA commented on HIVE-10795:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741327/HIVE-10795.patch

{color:green}SUCCESS:{color} +1 9014 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4354/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4354/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4354/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741327 - PreCommit-HIVE-TRUNK-Build

> Remove use of PerfLogger from Orc
> -
>
> Key: HIVE-10795
> URL: https://issues.apache.org/jira/browse/HIVE-10795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-10795.patch, HIVE-10795.patch, HIVE-10795.patch
>
>
> PerfLogger is yet another class with a huge dependency set that Orc doesn't 
> need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11058) Make alter_merge* tests (ORC only) stable across different OSes

2015-06-23 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-11058.
--
Resolution: Won't Fix

The stats difference can occur when tests are run in different timezones. ORC 
stores the timezone id in stripe metadata causing difference in file sizes.

> Make alter_merge* tests (ORC only) stable across different OSes
> ---
>
> Key: HIVE-11058
> URL: https://issues.apache.org/jira/browse/HIVE-11058
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> alter_merge* (ORC only) tests are showing stats diff in different OSes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7288:
---
Tags: hadoop streaming, WebHcat, libjars, archives, MicrosoftSupport  (was: 
hadoop streaming, WebHcat, libjars, archives, MicrosoftCSS)

> Enable support for -libjars and -archives in WebHcat for Streaming MapReduce 
> jobs
> -
>
> Key: HIVE-7288
> URL: https://issues.apache.org/jira/browse/HIVE-7288
> Project: Hive
>  Issue Type: New Feature
>  Components: WebHCat
>Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
> Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
>Reporter: Azim Uddin
>Assignee: shanyu zhao
> Attachments: HIVE-7288.1.patch, hive-7288.patch
>
>
> Issue:
> ==
> Due to lack of parameters (or support for) equivalent of '-libjars' and 
> '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
> Archive files with a Streaming MapReduce job, when the job is submitted via 
> WebHcat/templeton. 
> I am citing a few use cases here, but there can be plenty of scenarios like 
> this-
> #1 
> (for -archives):In order to use R with a hadoop distribution like HDInsight 
> or HDP on Windows, we could package the R directory up in a zip file and 
> rename it to r.jar and put it into HDFS or WASB. We can then do 
> something like this from hadoop command line (ignore the wasb syntax, same 
> command can be run with hdfs) - 
> hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
> wasb:///example/jars/r.jar -files 
> "wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper 
> "./r.jar/bin/Rscript.exe mapper.r" -reducer "./r.jar/bin/Rscript.exe 
> reducer.r" -input /example/data/gutenberg -output /probe/r/wordcount
> This works from hadoop command line, but due to lack of support for 
> '-archives' parameter in WebHcat, we can't submit the same Streaming MR job 
> via WebHcat.
> #2 (for -libjars):
> Consider a scenario where a user would like to use a custom inputFormat with 
> a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a 
> hadoop command line we can do something like this - 
> hadoop jar /path/to/hadoop-streaming.jar \
> -libjars /path/to/custom-formats.jar \
> -D map.output.key.field.separator=, \
> -D mapred.text.key.partitioner.options=-k1,1 \
> -input my_data/ \
> -output my_output/ \
> -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
> \
> -mapper my_mapper.py \
> -reducer my_reducer.py \
> But due to lack of support for '-libjars' parameter for streaming MapReduce 
> job in WebHcat, we can't submit the above streaming MR job (that uses a 
> custom Java JAR) via WebHcat.
> Impact:
> 
> We think, being able to submit jobs remotely is a vital feature for hadoop to 
> be enterprise-ready and WebHcat plays an important role there. Streaming 
> MapReduce job is also very important for interoperability. So, it would be 
> very useful to keep WebHcat on par with hadoop command line in terms of 
> streaming MR job submission capability.
> Ask:
> 
> Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
> streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7347) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7347:
---
Tags: webhcat, Pig, execute, schema, MicrosoftSupport  (was: webhcat, Pig, 
execute, schema, MicrosoftCSS)

> Pig Query with defined schema fails when submitted via WebHcat 'execute' 
> parameter
> --
>
> Key: HIVE-7347
> URL: https://issues.apache.org/jira/browse/HIVE-7347
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.12.0, 0.13.0
> Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1  
>Reporter: Azim Uddin
>
> 1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named 
> rawInput.tsv) like this (just an example, you can use any) -
> http://a.com  http://b.com1
> http://b.com  http://c.com2
> http://d.com  http://e.com3
> 2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat 
> using 'execute' parameter, something like this-
> curl.exe -d execute="rawInput = load '/test/data' using PigStorage as 
> (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); 
> readyInput = limit rawInput 10; store readyInput into '/test/output' using 
> PigStorage;" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> The job fails with exit code 255 -
> "[main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job 
> failed with exit code 255"
> From stderr, we see the following -"readyInput was unexpected at this time."
> 3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' 
> parameter, instead of 'execute' parameter - 
> a. Create a pig script called pig-script.txt with the query below and put it 
> HDFS /test/script
> rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, 
> DestinationUrl:chararray, InstanceCount:int);
> readyInput = limit rawInput 10;
> store readyInput into '/test/Output' using PigStorage;
> b. Run the job via webHcat:
> curl.exe -d file="/test/script/pig_script.txt" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> 4. Also, WebHcat 'execute' option works if we don't define the schema in the 
> Pig query, something like this-
> curl.exe -d execute="rawInput = load '/test/data' using PigStorage; 
> readyInput = limit rawInput 10; store readyInput into '/test/output' using 
> PigStorage;" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> Ask is-
> WebHcat 'execute' option should work for Pig query with schema defined - it 
> appears to be a parsing issue with WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598398#comment-14598398
 ] 

Ashutosh Chauhan commented on HIVE-10790:
-

+1

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1
>
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table like as 
> {code:sql}
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') 
> select ur,rf,it,dt from custom.rank_text where logdate='2015051500';
> {code}
> will throws a error ,
> {noformat}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598351#comment-14598351
 ] 

Xuefu Zhang commented on HIVE-10996:


Okay. Thanks for the explanation.

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
> HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
> HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
> HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10895:

Attachment: HIVE-10895.2.patch

> ObjectStore does not close Query objects in some calls, causing a potential 
> leak in some metastore db resources
> ---
>
> Key: HIVE-10895
> URL: https://issues.apache.org/jira/browse/HIVE-10895
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Takahiko Saito
>Assignee: Aihua Xu
> Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch
>
>
> During testing, we've noticed Oracle db running out of cursors. Might be 
> related to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7288:
---
Tags: hadoop streaming, WebHcat, libjars, archives, MicrosoftCSS  (was: 
hadoop streaming, WebHcat, libjars, archives, CSS)

> Enable support for -libjars and -archives in WebHcat for Streaming MapReduce 
> jobs
> -
>
> Key: HIVE-7288
> URL: https://issues.apache.org/jira/browse/HIVE-7288
> Project: Hive
>  Issue Type: New Feature
>  Components: WebHCat
>Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
> Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
>Reporter: Azim Uddin
>Assignee: shanyu zhao
> Attachments: HIVE-7288.1.patch, hive-7288.patch
>
>
> Issue:
> ==
> Due to lack of parameters (or support for) equivalent of '-libjars' and 
> '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
> Archive files with a Streaming MapReduce job, when the job is submitted via 
> WebHcat/templeton. 
> I am citing a few use cases here, but there can be plenty of scenarios like 
> this-
> #1 
> (for -archives):In order to use R with a hadoop distribution like HDInsight 
> or HDP on Windows, we could package the R directory up in a zip file and 
> rename it to r.jar and put it into HDFS or WASB. We can then do 
> something like this from hadoop command line (ignore the wasb syntax, same 
> command can be run with hdfs) - 
> hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
> wasb:///example/jars/r.jar -files 
> "wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper 
> "./r.jar/bin/Rscript.exe mapper.r" -reducer "./r.jar/bin/Rscript.exe 
> reducer.r" -input /example/data/gutenberg -output /probe/r/wordcount
> This works from hadoop command line, but due to lack of support for 
> '-archives' parameter in WebHcat, we can't submit the same Streaming MR job 
> via WebHcat.
> #2 (for -libjars):
> Consider a scenario where a user would like to use a custom inputFormat with 
> a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a 
> hadoop command line we can do something like this - 
> hadoop jar /path/to/hadoop-streaming.jar \
> -libjars /path/to/custom-formats.jar \
> -D map.output.key.field.separator=, \
> -D mapred.text.key.partitioner.options=-k1,1 \
> -input my_data/ \
> -output my_output/ \
> -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
> \
> -mapper my_mapper.py \
> -reducer my_reducer.py \
> But due to lack of support for '-libjars' parameter for streaming MapReduce 
> job in WebHcat, we can't submit the above streaming MR job (that uses a 
> custom Java JAR) via WebHcat.
> Impact:
> 
> We think, being able to submit jobs remotely is a vital feature for hadoop to 
> be enterprise-ready and WebHcat plays an important role there. Streaming 
> MapReduce job is also very important for interoperability. So, it would be 
> very useful to keep WebHcat on par with hadoop command line in terms of 
> streaming MR job submission capability.
> Ask:
> 
> Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
> streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7347) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7347:
---
Tags: webhcat, Pig, execute, schema, MicrosoftCSS  (was: webhcat, Pig, 
execute, schema, CSS)

> Pig Query with defined schema fails when submitted via WebHcat 'execute' 
> parameter
> --
>
> Key: HIVE-7347
> URL: https://issues.apache.org/jira/browse/HIVE-7347
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.12.0, 0.13.0
> Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1  
>Reporter: Azim Uddin
>
> 1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named 
> rawInput.tsv) like this (just an example, you can use any) -
> http://a.com  http://b.com1
> http://b.com  http://c.com2
> http://d.com  http://e.com3
> 2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat 
> using 'execute' parameter, something like this-
> curl.exe -d execute="rawInput = load '/test/data' using PigStorage as 
> (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); 
> readyInput = limit rawInput 10; store readyInput into '/test/output' using 
> PigStorage;" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> The job fails with exit code 255 -
> "[main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job 
> failed with exit code 255"
> From stderr, we see the following -"readyInput was unexpected at this time."
> 3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' 
> parameter, instead of 'execute' parameter - 
> a. Create a pig script called pig-script.txt with the query below and put it 
> HDFS /test/script
> rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, 
> DestinationUrl:chararray, InstanceCount:int);
> readyInput = limit rawInput 10;
> store readyInput into '/test/Output' using PigStorage;
> b. Run the job via webHcat:
> curl.exe -d file="/test/script/pig_script.txt" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> 4. Also, WebHcat 'execute' option works if we don't define the schema in the 
> Pig query, something like this-
> curl.exe -d execute="rawInput = load '/test/data' using PigStorage; 
> readyInput = limit rawInput 10; store readyInput into '/test/output' using 
> PigStorage;" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> Ask is-
> WebHcat 'execute' option should work for Pig query with schema defined - it 
> appears to be a parsing issue with WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7347) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7347:
---
Tags: webhcat, Pig, execute, schema, CSS  (was: webhcat, Pig, execute, 
schema)

> Pig Query with defined schema fails when submitted via WebHcat 'execute' 
> parameter
> --
>
> Key: HIVE-7347
> URL: https://issues.apache.org/jira/browse/HIVE-7347
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.12.0, 0.13.0
> Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1  
>Reporter: Azim Uddin
>
> 1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named 
> rawInput.tsv) like this (just an example, you can use any) -
> http://a.com  http://b.com1
> http://b.com  http://c.com2
> http://d.com  http://e.com3
> 2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat 
> using 'execute' parameter, something like this-
> curl.exe -d execute="rawInput = load '/test/data' using PigStorage as 
> (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); 
> readyInput = limit rawInput 10; store readyInput into '/test/output' using 
> PigStorage;" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> The job fails with exit code 255 -
> "[main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job 
> failed with exit code 255"
> From stderr, we see the following -"readyInput was unexpected at this time."
> 3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' 
> parameter, instead of 'execute' parameter - 
> a. Create a pig script called pig-script.txt with the query below and put it 
> HDFS /test/script
> rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, 
> DestinationUrl:chararray, InstanceCount:int);
> readyInput = limit rawInput 10;
> store readyInput into '/test/Output' using PigStorage;
> b. Run the job via webHcat:
> curl.exe -d file="/test/script/pig_script.txt" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> 4. Also, WebHcat 'execute' option works if we don't define the schema in the 
> Pig query, something like this-
> curl.exe -d execute="rawInput = load '/test/data' using PigStorage; 
> readyInput = limit rawInput 10; store readyInput into '/test/output' using 
> PigStorage;" -d statusdir="/test/status" 
> "http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any
> Ask is-
> WebHcat 'execute' option should work for Pig query with schema defined - it 
> appears to be a parsing issue with WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7288:
---
Tags: hadoop streaming, WebHcat, libjars, archives, CSS  (was: hadoop 
streaming, WebHcat, libjars, archives)

> Enable support for -libjars and -archives in WebHcat for Streaming MapReduce 
> jobs
> -
>
> Key: HIVE-7288
> URL: https://issues.apache.org/jira/browse/HIVE-7288
> Project: Hive
>  Issue Type: New Feature
>  Components: WebHCat
>Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
> Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
>Reporter: Azim Uddin
>Assignee: shanyu zhao
> Attachments: HIVE-7288.1.patch, hive-7288.patch
>
>
> Issue:
> ==
> Due to lack of parameters (or support for) equivalent of '-libjars' and 
> '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
> Archive files with a Streaming MapReduce job, when the job is submitted via 
> WebHcat/templeton. 
> I am citing a few use cases here, but there can be plenty of scenarios like 
> this-
> #1 
> (for -archives):In order to use R with a hadoop distribution like HDInsight 
> or HDP on Windows, we could package the R directory up in a zip file and 
> rename it to r.jar and put it into HDFS or WASB. We can then do 
> something like this from hadoop command line (ignore the wasb syntax, same 
> command can be run with hdfs) - 
> hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
> wasb:///example/jars/r.jar -files 
> "wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper 
> "./r.jar/bin/Rscript.exe mapper.r" -reducer "./r.jar/bin/Rscript.exe 
> reducer.r" -input /example/data/gutenberg -output /probe/r/wordcount
> This works from hadoop command line, but due to lack of support for 
> '-archives' parameter in WebHcat, we can't submit the same Streaming MR job 
> via WebHcat.
> #2 (for -libjars):
> Consider a scenario where a user would like to use a custom inputFormat with 
> a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a 
> hadoop command line we can do something like this - 
> hadoop jar /path/to/hadoop-streaming.jar \
> -libjars /path/to/custom-formats.jar \
> -D map.output.key.field.separator=, \
> -D mapred.text.key.partitioner.options=-k1,1 \
> -input my_data/ \
> -output my_output/ \
> -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
> \
> -mapper my_mapper.py \
> -reducer my_reducer.py \
> But due to lack of support for '-libjars' parameter for streaming MapReduce 
> job in WebHcat, we can't submit the above streaming MR job (that uses a 
> custom Java JAR) via WebHcat.
> Impact:
> 
> We think, being able to submit jobs remotely is a vital feature for hadoop to 
> be enterprise-ready and WebHcat plays an important role there. Streaming 
> MapReduce job is also very important for interoperability. So, it would be 
> very useful to keep WebHcat on par with hadoop command line in terms of 
> streaming MR job submission capability.
> Ask:
> 
> Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
> streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598355#comment-14598355
 ] 

Ashutosh Chauhan commented on HIVE-11007:
-

+1

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
> mapInputToDP should depends on the last SEL
> -
>
> Key: HIVE-11007
> URL: https://issues.apache.org/jira/browse/HIVE-11007
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch
>
>
> In dynamic partitioning case, for example, we are going to have 
> TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
> SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598349#comment-14598349
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


[~xuefuz], that test has been failing intermittently for last QA runs, not only 
those related to this patch:

{noformat}
...
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4309/testReport/
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4313/testReport/
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4317/testReport/
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4321/testReport/
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4324/testReport/
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4332/testReport/
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4336/testReport/
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4337/testReport/
{noformat}

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
> HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
> HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
> HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project 

[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin

2015-06-23 Thread Rohit Dholakia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-10438:
--
Description: 
This JIRA proposes an architecture for enabling ResultSet compression which 
uses an external plugin. 

The patch has three aspects to it: 
0. An architecture for enabling ResultSet compression with external plugins
1. An example plugin to demonstrate end-to-end functionality 
2. A container to allow everyone to write and test ResultSet compressors with a 
query submitter (https://github.com/xiaom/hs2driver) 

Also attaching a design document explaining the changes, experimental results 
document, and a pdf explaining how to setup the docker container to observe 
end-to-end functionality of ResultSet compression. 

https://reviews.apache.org/r/35792/ Review board link. 

  was:
This JIRA proposes an architecture for enabling ResultSet compression which 
uses an external plugin. 

The patch has three aspects to it: 
0. An architecture for enabling ResultSet compression with external plugins
1. An example plugin to demonstrate end-to-end functionality 
2. A container to allow everyone to write and test ResultSet compressors with a 
query submitter (https://github.com/xiaom/hs2driver) 

Also attaching a design document explaining the changes, experimental results 
document, and a pdf explaining how to setup the docker container to observe 
end-to-end functionality of ResultSet compression. 




> Architecture for  ResultSet Compression via external plugin
> ---
>
> Key: HIVE-10438
> URL: https://issues.apache.org/jira/browse/HIVE-10438
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive, Thrift API
>Affects Versions: 1.2.0
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
>  Labels: patch
> Attachments: HIVE-10438.patch, Proposal-rscompressor.pdf, 
> Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2driver-master.zip, 
> hs2resultSetcompressor.zip, readme.txt
>
>
> This JIRA proposes an architecture for enabling ResultSet compression which 
> uses an external plugin. 
> The patch has three aspects to it: 
> 0. An architecture for enabling ResultSet compression with external plugins
> 1. An example plugin to demonstrate end-to-end functionality 
> 2. A container to allow everyone to write and test ResultSet compressors with 
> a query submitter (https://github.com/xiaom/hs2driver) 
> Also attaching a design document explaining the changes, experimental results 
> document, and a pdf explaining how to setup the docker container to observe 
> end-to-end functionality of ResultSet compression. 
> https://reviews.apache.org/r/35792/ Review board link. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598338#comment-14598338
 ] 

Xuefu Zhang commented on HIVE-10996:


{quote}
fail is unrelated. It is ready to go in. 
{quote}
[~jcamachorodriguez], Could you elaborate why you think that the test failure 
isn't related? I can clearly see there is a result diff generated by your patch 
for test org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28.
{code}
173c173
< POSTHOOK: Lineage: dest_j1.key EXPRESSION [(src1)x.FieldSchema(name:key, 
type:string, comment:default), ]
---
> POSTHOOK: Lineage: dest_j1.key SIMPLE [(src1)x.FieldSchema(name:key, 
> type:string, comment:default), ]
{code}

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
> HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
> HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
> HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write

2015-06-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598314#comment-14598314
 ] 

Eugene Koifman commented on HIVE-11030:
---

The test failure is not related.  The same failure appears in other runs w/o 
this patch.

[~alangates] could you review please?

> Enhance storage layer to create one delta file per write
> 
>
> Key: HIVE-11030
> URL: https://issues.apache.org/jira/browse/HIVE-11030
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch
>
>
> Currently each txn using ACID insert/update/delete will generate a delta 
> directory like delta_100_101.  In order to support multi-statement 
> transactions we must generate one delta per operation within the transaction 
> so the deltas would be named like delta_100_101_0001, etc.
> Support for MERGE (HIVE-10924) would need the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11037) HiveOnTez: make explain user level = true as default

2015-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598303#comment-14598303
 ] 

Laljo John Pullokkaran commented on HIVE-11037:
---

Committed to master.

> HiveOnTez: make explain user level = true as default
> 
>
> Key: HIVE-11037
> URL: https://issues.apache.org/jira/browse/HIVE-11037
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, 
> HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, 
> HIVE-11037.06.patch, HIVE-11037.07.patch, HIVE-11037.08.patch
>
>
> In Hive-9780, we introduced a new level of explain for hive on tez. We would 
> like to make it running by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598284#comment-14598284
 ] 

Hive QA commented on HIVE-11030:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741324/HIVE-11030.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9050 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4353/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4353/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4353/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741324 - PreCommit-HIVE-TRUNK-Build

> Enhance storage layer to create one delta file per write
> 
>
> Key: HIVE-11030
> URL: https://issues.apache.org/jira/browse/HIVE-11030
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch
>
>
> Currently each txn using ACID insert/update/delete will generate a delta 
> directory like delta_100_101.  In order to support multi-statement 
> transactions we must generate one delta per operation within the transaction 
> so the deltas would be named like delta_100_101_0001, etc.
> Support for MERGE (HIVE-10924) would need the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10233:
--
Attachment: HIVE-10233.12.patch

> Hive on tez: memory manager for grace hash join
> ---
>
> Key: HIVE-10233
> URL: https://issues.apache.org/jira/browse/HIVE-10233
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap, 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Gunther Hagleitner
> Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
> HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
> HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
> HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, 
> HIVE-10233.12.patch
>
>
> We need a memory manager in llap/tez to manage the usage of memory across 
> threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default

2015-06-23 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11037:
---
Attachment: HIVE-11037.08.patch

rebase to master (no difference except line offset) per [~jpullokkaran]'s 
request.

> HiveOnTez: make explain user level = true as default
> 
>
> Key: HIVE-11037
> URL: https://issues.apache.org/jira/browse/HIVE-11037
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, 
> HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, 
> HIVE-11037.06.patch, HIVE-11037.07.patch, HIVE-11037.08.patch
>
>
> In Hive-9780, we introduced a new level of explain for hive on tez. We would 
> like to make it running by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11083) Make test cbo_windowing robust

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598148#comment-14598148
 ] 

Hive QA commented on HIVE-11083:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741305/HIVE-11083.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9013 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4352/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4352/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4352/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741305 - PreCommit-HIVE-TRUNK-Build

> Make test cbo_windowing robust
> --
>
> Key: HIVE-11083
> URL: https://issues.apache.org/jira/browse/HIVE-11083
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11083.patch
>
>
> Make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598142#comment-14598142
 ] 

Hive QA commented on HIVE-10999:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741320/HIVE-10999.3-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7997 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/903/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/903/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-903/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741320 - PreCommit-HIVE-SPARK-Build

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10553) Remove hardcoded Parquet references from SearchArgumentImpl

2015-06-23 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-10553:
-
Attachment: HIVE-10553.patch

My patch had gone a little stale, so I updated it.

I also manually re-ran the test case that failed in jenkins and it passed.

> Remove hardcoded Parquet references from SearchArgumentImpl
> ---
>
> Key: HIVE-10553
> URL: https://issues.apache.org/jira/browse/HIVE-10553
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Owen O'Malley
> Attachments: HIVE-10553.patch, HIVE-10553.patch, HIVE-10553.patch
>
>
> SARGs currently depend on Parquet code, which causes a tight coupling between 
> parquet releases and storage-api versions.
> Move Parquet code out to its own RecordReader, similar to ORC's SargApplier 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11084) Issue in Parquet Hive Table

2015-06-23 Thread Chanchal Kumar Ghosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chanchal Kumar Ghosh updated HIVE-11084:

Description: 
{code}
hive> CREATE TABLE intable_p (
>   sr_no int,
>   name string,
>   emp_id int
> ) PARTITIONED BY (
>   a string,
>   b string,
>   c string
> ) ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
>   LINES TERMINATED BY '\n'
> STORED AS PARQUET;

hive> insert overwrite table intable_p partition (a='a', b='b', c='c') select * 
from intable;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator

MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.59 sec   HDFS Read: 247 HDFS Write: 
410 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 590 msec
OK
Time taken: 30.382 seconds
hive> show create table intable_p;
OK
CREATE  TABLE `intable_p`(
  `sr_no` int,
  `name` string,
  `emp_id` int)
PARTITIONED BY (
  `a` string,
  `b` string,
  `c` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://nameservice1/hive/db/intable_p'
TBLPROPERTIES (
  'transient_lastDdlTime'='1435080569')
Time taken: 0.212 seconds, Fetched: 19 row(s)
hive> CREATE  TABLE `intable_p2`(
>   `sr_no` int,
>   `name` string,
>   `emp_id` int)
> PARTITIONED BY (
>   `a` string,
>   `b` string,
>   `c` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
>   LINES TERMINATED BY '\n'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
OK
Time taken: 0.179 seconds
hive> insert overwrite table intable_p2 partition (a='a', b='b', c='c') select 
* from intable;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
...
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-06-23 17:34:40,471 Stage-1 map = 0%,  reduce = 0%
2015-06-23 17:35:10,753 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1433246369760_7947 with errors
Error during job, obtaining debugging information...
Examining task ID: task_ (and more) from job job_

Task with the most failures(4):
-
Task ID:
  task_

URL:
  
-
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"sr_no":1,"name":"ABC","emp_id":1001}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"sr_no":1,"name":"ABC","emp_id":1001}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: {color:red}java.lang.ClassCastException: org.apache.hadoop.io.Text 
cannot be cast to org.apache.hadoop.io.ArrayWritable{color}
at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:105)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:628)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
... 9 more


FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive>
{code}

What is the issue with my second table?

  was:
{quote}
hiv

[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598131#comment-14598131
 ] 

Laljo John Pullokkaran commented on HIVE-10996:
---

[~jcamachorodriguez]Patch needs to be modified for branch-1 & possibly for 
branch-1.0

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
> HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
> HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
> HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598128#comment-14598128
 ] 

Laljo John Pullokkaran commented on HIVE-10996:
---

Committed to master, 1.2 branch.


> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
> HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
> HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
> HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598110#comment-14598110
 ] 

Jesus Camacho Rodriguez commented on HIVE-10533:


Thanks [~ashutoshc], I need to check why the changes in the last version of the 
patch introduced these regressions. I'll take a look and submit a new patch.

> CBO (Calcite Return Path): Join to MultiJoin support for outer joins
> 
>
> Key: HIVE-10533
> URL: https://issues.apache.org/jira/browse/HIVE-10533
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10533.01.patch, HIVE-10533.02.patch, 
> HIVE-10533.02.patch, HIVE-10533.03.patch, HIVE-10533.04.patch, 
> HIVE-10533.patch
>
>
> CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-23 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598105#comment-14598105
 ] 

Wei Zheng commented on HIVE-10233:
--

[~hagleitn] [~vikram.dixit]
1. Same question as [~mmokhtar] mentioned, why do we only allocate less than 1% 
memory to the mapjoin?
2. What is the use for pctx.getConf(); in the beginning of 
MemoryDecider.resolve()?
3. For these three method calls, the param work is not used, so can be removed. 
Also can consider removing the param work in evaluateWork(TezWork work, 
BaseWork w).
evaluateMapWork(work, (MapWork) w);
evaluateReduceWork(work, (ReduceWork) w);
evaluateMergeWork(work, (MergeJoinWork) w);
4. In evaluateOperators(BaseWork w, PhysicalContext pctx), pctx is not used
5. Indentation for evaluateOperators needs to be adjusted

> Hive on tez: memory manager for grace hash join
> ---
>
> Key: HIVE-10233
> URL: https://issues.apache.org/jira/browse/HIVE-10233
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap, 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Gunther Hagleitner
> Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
> HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
> HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
> HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch
>
>
> We need a memory manager in llap/tez to manage the usage of memory across 
> threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11079:
--
Attachment: HIVE-11079.2.patch

Update MiniTez test for vector_partitioned_date_time.q

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11084) Issue in Parquet Hive Table

2015-06-23 Thread Chanchal Kumar Ghosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chanchal Kumar Ghosh updated HIVE-11084:

Summary: Issue in Parquet Hive Table  (was: Issue in Parquet Hove Table)

> Issue in Parquet Hive Table
> ---
>
> Key: HIVE-11084
> URL: https://issues.apache.org/jira/browse/HIVE-11084
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.9.0
> Environment: GNU/Linux
>Reporter: Chanchal Kumar Ghosh
>
> {quote}
> hive> CREATE TABLE intable_p (
> >   sr_no int,
> >   name string,
> >   emp_id int
> > ) PARTITIONED BY (
> >   a string,
> >   b string,
> >   c string
> > ) ROW FORMAT DELIMITED
> >   FIELDS TERMINATED BY '\t'
> >   LINES TERMINATED BY '\n'
> > STORED AS PARQUET;
> hive> insert overwrite table intable_p partition (a='a', b='b', c='c') select 
> * from intable;
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> 
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   Cumulative CPU: 2.59 sec   HDFS Read: 247 HDFS Write: 
> 410 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 590 msec
> OK
> Time taken: 30.382 seconds
> hive> show create table intable_p;
> OK
> CREATE  TABLE `intable_p`(
>   `sr_no` int,
>   `name` string,
>   `emp_id` int)
> PARTITIONED BY (
>   `a` string,
>   `b` string,
>   `c` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\t'
>   LINES TERMINATED BY '\n'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'hdfs://nameservice1/hive/db/intable_p'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1435080569')
> Time taken: 0.212 seconds, Fetched: 19 row(s)
> hive> CREATE  TABLE `intable_p2`(
> >   `sr_no` int,
> >   `name` string,
> >   `emp_id` int)
> > PARTITIONED BY (
> >   `a` string,
> >   `b` string,
> >   `c` string)
> > ROW FORMAT DELIMITED
> >   FIELDS TERMINATED BY '\t'
> >   LINES TERMINATED BY '\n'
> > STORED AS INPUTFORMAT
> >   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> > OUTPUTFORMAT
> >   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
> OK
> Time taken: 0.179 seconds
> hive> insert overwrite table intable_p2 partition (a='a', b='b', c='c') 
> select * from intable;
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> ...
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2015-06-23 17:34:40,471 Stage-1 map = 0%,  reduce = 0%
> 2015-06-23 17:35:10,753 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_1433246369760_7947 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_ (and more) from job job_
> Task with the most failures(4):
> -
> Task ID:
>   task_
> URL:
>   
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"sr_no":1,"name":"ABC","emp_id":1001}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"sr_no":1,"name":"ABC","emp_id":1001}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: {color:red}java.lang.ClassCastException: org.apache.hadoop.io.Text 
> cannot be cast to org.apache.hadoop.io.ArrayWritable{color}
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:628)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOper

[jira] [Updated] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.

2015-06-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11062:
---
Fix Version/s: 2.0.0

> Remove Exception stacktrace from Log.info when ACL is not supported.
> 
>
> Key: HIVE-11062
> URL: https://issues.apache.org/jira/browse/HIVE-11062
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HIVE-11062.1.patch
>
>
> When logging set to info, Extended ACL Enabled and the file system does not 
> support ACL, there are a lot of Exception stack trace in the log file. 
> Although it is benign, it can easily make users frustrated. We should set the 
> level to show the Exception in debug. 
> Current, the Exception in the log looks like:
> {noformat}
> 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: 
> Skipping ACL inheritance: File system for path s3a://yibing/hive does not 
> support ACLs but dfs.namenode.acls.enabled is set to true: 
> java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
> getAclStatus
> java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
> getAclStatus
>   at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429)
>   at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786)
>   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694)
>   at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10795) Remove use of PerfLogger from Orc

2015-06-23 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-10795:
-
Attachment: HIVE-10795.patch

Thanks for the catch, Damien.

I removed both of the now unused CLASS_NAME variables.

I realized that OrcInputFormat already had a static for LOG.isDebugEnabled so I 
switched all of the calls to use that.

> Remove use of PerfLogger from Orc
> -
>
> Key: HIVE-10795
> URL: https://issues.apache.org/jira/browse/HIVE-10795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-10795.patch, HIVE-10795.patch, HIVE-10795.patch
>
>
> PerfLogger is yet another class with a huge dependency set that Orc doesn't 
> need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598006#comment-14598006
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


[~jpullokkaran], fail is unrelated. It is ready to go in. Thanks

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
> HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
> HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
> HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597984#comment-14597984
 ] 

Gunther Hagleitner commented on HIVE-10729:
---

[~mmccline] the test on this bug doesn't happen anymore but there is: 
https://issues.apache.org/jira/browse/HIVE-11051. The attached test on that bug 
used to be fixed with this patch here. It might makes sense to resolve this one 
and move the code over to HIVE-11051 if that's the case.

> Query failed when select complex columns from joinned table (tez map join 
> only)
> ---
>
> Key: HIVE-10729
> URL: https://issues.apache.org/jira/browse/HIVE-10729
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.0
>Reporter: Selina Zhang
>Assignee: Matt McCline
> Attachments: HIVE-10729.03.patch, HIVE-10729.1.patch, 
> HIVE-10729.2.patch
>
>
> When map join happens, if projection columns include complex data types, 
> query will fail. 
> Steps to reproduce:
> {code:sql}
> hive> set hive.auto.convert.join;
> hive.auto.convert.join=true
> hive> desc foo;
> a array
> hive> select * from foo;
> [1,2]
> hive> desc src_int;
> key   int
> value string
> hive> select * from src_int where key=2;
> 2val_2
> hive> select * from foo join src_int src  on src.key = foo.a[1];
> {code}
> Query will fail with stack trace
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
> [Ljava.lang.Object;
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
>   at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
>   ... 23 more
> {noformat}
> Similar error when projection columns include a map:
> {code:sql}
> hive> CREATE TABLE test (a INT, b MAP) STORED AS ORC;
> hive> INSERT OVERWRITE TABLE test SELECT 1, MAP(1, "val_1", 2, "val_2") FROM 
> src LIMIT 1;
> hive> select * from src join test where src.key=test.a;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11030) Enhance storage layer to create one delta file per write

2015-06-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11030:
--
Attachment: HIVE-11030.3.patch

> Enhance storage layer to create one delta file per write
> 
>
> Key: HIVE-11030
> URL: https://issues.apache.org/jira/browse/HIVE-11030
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch
>
>
> Currently each txn using ACID insert/update/delete will generate a delta 
> directory like delta_100_101.  In order to support multi-statement 
> transactions we must generate one delta per operation within the transaction 
> so the deltas would be named like delta_100_101_0001, etc.
> Support for MERGE (HIVE-10924) would need the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11083) Make test cbo_windowing robust

2015-06-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597965#comment-14597965
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11083:
--

+1 pending run

> Make test cbo_windowing robust
> --
>
> Key: HIVE-11083
> URL: https://issues.apache.org/jira/browse/HIVE-11083
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11083.patch
>
>
> Make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597959#comment-14597959
 ] 

Hive QA commented on HIVE-10729:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741285/HIVE-10729.03.patch

{color:green}SUCCESS:{color} +1 9014 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4351/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4351/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4351/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741285 - PreCommit-HIVE-TRUNK-Build

> Query failed when select complex columns from joinned table (tez map join 
> only)
> ---
>
> Key: HIVE-10729
> URL: https://issues.apache.org/jira/browse/HIVE-10729
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.0
>Reporter: Selina Zhang
>Assignee: Matt McCline
> Attachments: HIVE-10729.03.patch, HIVE-10729.1.patch, 
> HIVE-10729.2.patch
>
>
> When map join happens, if projection columns include complex data types, 
> query will fail. 
> Steps to reproduce:
> {code:sql}
> hive> set hive.auto.convert.join;
> hive.auto.convert.join=true
> hive> desc foo;
> a array
> hive> select * from foo;
> [1,2]
> hive> desc src_int;
> key   int
> value string
> hive> select * from src_int where key=2;
> 2val_2
> hive> select * from foo join src_int src  on src.key = foo.a[1];
> {code}
> Query will fail with stack trace
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
> [Ljava.lang.Object;
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
>   at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
>   ... 23 more
> {noformat}
> Similar error when projection columns include a map:
> {code:sql}
> hive> CREATE TABLE test (a INT, b MAP) STORED AS ORC;
> hive> INSERT OVERWRITE TABLE test SELECT 1, MAP(1, "val_1", 2, "val_2") FROM 
> src LIMIT 1;
> hive> select * from src join test where src.key=test.a;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10999:
---
Attachment: HIVE-10999.3-spark.patch

Resubmitting patch to debug test environment.

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests

2015-06-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11076:
--
Fix Version/s: (was: 1.3.0)

> Explicitly set hive.cbo.enable=true for some tests
> --
>
> Key: HIVE-11076
> URL: https://issues.apache.org/jira/browse/HIVE-11076
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10795) Remove use of PerfLogger from Orc

2015-06-23 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597871#comment-14597871
 ] 

Damien Carol commented on HIVE-10795:
-

[~owen.omalley] you should remove _CLASS_NAME_ (line:41)

> Remove use of PerfLogger from Orc
> -
>
> Key: HIVE-10795
> URL: https://issues.apache.org/jira/browse/HIVE-10795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-10795.patch, HIVE-10795.patch
>
>
> PerfLogger is yet another class with a huge dependency set that Orc doesn't 
> need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests

2015-06-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11076:
--
Fix Version/s: 1.3.0

> Explicitly set hive.cbo.enable=true for some tests
> --
>
> Key: HIVE-11076
> URL: https://issues.apache.org/jira/browse/HIVE-11076
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests

2015-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597843#comment-14597843
 ] 

Ashutosh Chauhan commented on HIVE-11076:
-

+1

> Explicitly set hive.cbo.enable=true for some tests
> --
>
> Key: HIVE-11076
> URL: https://issues.apache.org/jira/browse/HIVE-11076
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11083) Make test cbo_windowing robust

2015-06-23 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11083:

Attachment: HIVE-11083.patch

[~hsubramaniyan] can you take a look?

> Make test cbo_windowing robust
> --
>
> Key: HIVE-11083
> URL: https://issues.apache.org/jira/browse/HIVE-11083
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11083.patch
>
>
> Make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10795) Remove use of PerfLogger from Orc

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597812#comment-14597812
 ] 

Hive QA commented on HIVE-10795:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741217/HIVE-10795.patch

{color:green}SUCCESS:{color} +1 9013 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4350/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4350/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4350/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741217 - PreCommit-HIVE-TRUNK-Build

> Remove use of PerfLogger from Orc
> -
>
> Key: HIVE-10795
> URL: https://issues.apache.org/jira/browse/HIVE-10795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-10795.patch, HIVE-10795.patch
>
>
> PerfLogger is yet another class with a huge dependency set that Orc doesn't 
> need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597760#comment-14597760
 ] 

Hive QA commented on HIVE-10999:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741286/HIVE-10999.3-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7953 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/902/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/902/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-902/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741286 - PreCommit-HIVE-SPARK-Build

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597672#comment-14597672
 ] 

Hive QA commented on HIVE-11076:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741213/HIVE-11076.02.patch

{color:green}SUCCESS:{color} +1 9013 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4349/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4349/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4349/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741213 - PreCommit-HIVE-TRUNK-Build

> Explicitly set hive.cbo.enable=true for some tests
> --
>
> Key: HIVE-11076
> URL: https://issues.apache.org/jira/browse/HIVE-11076
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597634#comment-14597634
 ] 

Xuefu Zhang commented on HIVE-10999:


FYI, these failures could be also due to the testing environment issue, which 
[~spena] is looking into it.

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-23 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-10999:
--
Attachment: HIVE-10999.3-spark.patch

{{TestSparkClient.testMetricsCollection}} failed randomly because the executor 
run time doesn't guarantee to be >0. This may be related to SPARK-7058.

{{TestSparkClient.testJobSubmission}} can't be reproduced.

{{lateral_view_explode2}} failed with {{ClassNotFoundException: 
org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFExplode2}}. I'm not sure 
why upgrading spark exposes this issue. Patch v3 fixes the test on my side. 
Let's see what happens to other tests. [~chengxiang li] would you mind take a 
look at the changes about class loading? Thanks!

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, 
> HIVE-10999.3-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-23 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10729:

Attachment: HIVE-10729.03.patch

NOTE: We have a possible solution here but *NO TEST*.

There is an attached Q file with a array of integers and array of strings 
tests, but I haven't been able to get the tests to fail on trunk.  I'm not sure 
what I'm missing.  So, unclear if new code is being tested yet.

> Query failed when select complex columns from joinned table (tez map join 
> only)
> ---
>
> Key: HIVE-10729
> URL: https://issues.apache.org/jira/browse/HIVE-10729
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.0
>Reporter: Selina Zhang
>Assignee: Matt McCline
> Attachments: HIVE-10729.03.patch, HIVE-10729.1.patch, 
> HIVE-10729.2.patch
>
>
> When map join happens, if projection columns include complex data types, 
> query will fail. 
> Steps to reproduce:
> {code:sql}
> hive> set hive.auto.convert.join;
> hive.auto.convert.join=true
> hive> desc foo;
> a array
> hive> select * from foo;
> [1,2]
> hive> desc src_int;
> key   int
> value string
> hive> select * from src_int where key=2;
> 2val_2
> hive> select * from foo join src_int src  on src.key = foo.a[1];
> {code}
> Query will fail with stack trace
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
> [Ljava.lang.Object;
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
>   at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
>   ... 23 more
> {noformat}
> Similar error when projection columns include a map:
> {code:sql}
> hive> CREATE TABLE test (a INT, b MAP) STORED AS ORC;
> hive> INSERT OVERWRITE TABLE test SELECT 1, MAP(1, "val_1", 2, "val_2") FROM 
> src LIMIT 1;
> hive> select * from src join test where src.key=test.a;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.

2015-06-23 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597615#comment-14597615
 ] 

Yongzhi Chen commented on HIVE-11062:
-

Thank you [~spena] for reviewing it. 

> Remove Exception stacktrace from Log.info when ACL is not supported.
> 
>
> Key: HIVE-11062
> URL: https://issues.apache.org/jira/browse/HIVE-11062
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Minor
> Attachments: HIVE-11062.1.patch
>
>
> When logging set to info, Extended ACL Enabled and the file system does not 
> support ACL, there are a lot of Exception stack trace in the log file. 
> Although it is benign, it can easily make users frustrated. We should set the 
> level to show the Exception in debug. 
> Current, the Exception in the log looks like:
> {noformat}
> 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: 
> Skipping ACL inheritance: File system for path s3a://yibing/hive does not 
> support ACLs but dfs.namenode.acls.enabled is set to true: 
> java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
> getAclStatus
> java.lang.UnsupportedOperationException: S3AFileSystem doesn't support 
> getAclStatus
>   at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429)
>   at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786)
>   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694)
>   at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597572#comment-14597572
 ] 

Hive QA commented on HIVE-10233:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741210/HIVE-10233.11.patch

{color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 9013 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby_empty
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_env_var1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_env_var2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_selectDistinctStar
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_multi_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_dynamic_partition
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_group_by
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_multi_insert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4348/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4348/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4348/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 43 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741210 - PreCommit-HIVE-TRUNK-Build

> Hive on tez: memory manager for grace hash join
> ---
>
> Key: HIVE-10233
> URL: https://issues.apache.org/jira/browse/HIVE-10233
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap, 2.0.0
>Reporter: Vikram Dixit K
>Assi

[jira] [Commented] (HIVE-11073) ORC FileDump utility ignores errors when writing output

2015-06-23 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597562#comment-14597562
 ] 

Elliot West commented on HIVE-11073:


This test failure does not appear to be related to this patch.

> ORC FileDump utility ignores errors when writing output
> ---
>
> Key: HIVE-11073
> URL: https://issues.apache.org/jira/browse/HIVE-11073
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>Priority: Minor
>  Labels: cli, orc
> Attachments: HIVE-11073.1.patch
>
>
> The Hive command line provides the {{--orcfiledump}} utility for dumping data 
> contained within ORC files, specifically when using the {{-d}} option. 
> Generally, it is useful to be able to pipe the data extracted into other 
> commands and utilities to transform and control the data so that it is more 
> manageable by the CLI user. A classic example is {{less}}.
> When such command pipelines are currently constructed, the underlying 
> implementation in {{org.apache.hadoop.hive.ql.io.orc.FileDump#printJsonData}} 
> is oblivious to errors occurring when writing to its output stream. Such 
> errors are common place when a user issues {{Ctrl+C}} to kill the leaf 
> process. In this event the leaf process terminates immediately but the Hive 
> CLI process continues to execute until the full contents of the ORC file has 
> been read.
> By making {{FileDump}} considerate of output stream errors the process will 
> terminate as soon as the destination process exits (i.e. when the user kills 
> {{less}}) and control will be returned to the user as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-06-23 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597563#comment-14597563
 ] 

Elliot West commented on HIVE-10165:


This test failure does not appear to be related to this patch.

> Improve hive-hcatalog-streaming extensibility and support updates and deletes.
> --
>
> Key: HIVE-10165
> URL: https://issues.apache.org/jira/browse/HIVE-10165
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: streaming_api
> Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
> HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, 
> HIVE-10165.9.patch, mutate-system-overview.png
>
>
> h3. Overview
> I'd like to extend the 
> [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
>  API so that it also supports the writing of record updates and deletes in 
> addition to the already supported inserts.
> h3. Motivation
> We have many Hadoop processes outside of Hive that merge changed facts into 
> existing datasets. Traditionally we achieve this by: reading in a 
> ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
> sequence and then applying a function to determine inserted, updated, and 
> deleted rows. However, in our current scheme we must rewrite all partitions 
> that may potentially contain changes. In practice the number of mutated 
> records is very small when compared with the records contained in a 
> partition. This approach results in a number of operational issues:
> * Excessive amount of write activity required for small data changes.
> * Downstream applications cannot robustly read these datasets while they are 
> being updated.
> * Due to scale of the updates (hundreds or partitions) the scope for 
> contention is high. 
> I believe we can address this problem by instead writing only the changed 
> records to a Hive transactional table. This should drastically reduce the 
> amount of data that we need to write and also provide a means for managing 
> concurrent access to the data. Our existing merge processes can read and 
> retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
> an updated form of the hive-hcatalog-streaming API which will then have the 
> required data to perform an update or insert in a transactional manner. 
> h3. Benefits
> * Enables the creation of large-scale dataset merge processes  
> * Opens up Hive transactional functionality in an accessible manner to 
> processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597469#comment-14597469
 ] 

Hive QA commented on HIVE-11079:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741205/HIVE-11079.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9018 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partitioned_date_time
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4347/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4347/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4347/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741205 - PreCommit-HIVE-TRUNK-Build

> Fix qfile tests that fail on Windows due to CR/character escape differences
> ---
>
> Key: HIVE-11079
> URL: https://issues.apache.org/jira/browse/HIVE-11079
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11079.1.patch
>
>
> A few qfile tests are failing on Windows due to a couple of windows-specific 
> issues:
> - The table comment for the test includes a CR character, which is different 
> on Windows compared to Unix.
> - The partition path in the test includes a space character. Unlike Unix, on 
> Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597382#comment-14597382
 ] 

Hive QA commented on HIVE-10844:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741238/HIVE-10844.3-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7968 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/901/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/901/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-901/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741238 - PreCommit-HIVE-SPARK-Build

> Combine equivalent Works for HoS[Spark Branch]
> --
>
> Key: HIVE-10844
> URL: https://issues.apache.org/jira/browse/HIVE-10844
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-10844.1-spark.patch, HIVE-10844.2-spark.patch, 
> HIVE-10844.3-spark.patch
>
>
> Some Hive queries(like [TPCDS 
> Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql])
>  may share the same subquery, which translated into sperate, but equivalent 
> Works in SparkWork, combining these equivalent Works into a single one would 
> help to benifit from following dynamic RDD caching optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10438) Architecture for ResultSet Compression via external plugin

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597389#comment-14597389
 ] 

Hive QA commented on HIVE-10438:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741206/HIVE-10438.patch

{color:green}SUCCESS:{color} +1 9013 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4346/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4346/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4346/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741206 - PreCommit-HIVE-TRUNK-Build

> Architecture for  ResultSet Compression via external plugin
> ---
>
> Key: HIVE-10438
> URL: https://issues.apache.org/jira/browse/HIVE-10438
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive, Thrift API
>Affects Versions: 1.2.0
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
>  Labels: patch
> Attachments: HIVE-10438.patch, Proposal-rscompressor.pdf, 
> Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2driver-master.zip, 
> hs2resultSetcompressor.zip, readme.txt
>
>
> This JIRA proposes an architecture for enabling ResultSet compression which 
> uses an external plugin. 
> The patch has three aspects to it: 
> 0. An architecture for enabling ResultSet compression with external plugins
> 1. An example plugin to demonstrate end-to-end functionality 
> 2. A container to allow everyone to write and test ResultSet compressors with 
> a query submitter (https://github.com/xiaom/hs2driver) 
> Also attaching a design document explaining the changes, experimental results 
> document, and a pdf explaining how to setup the docker container to observe 
> end-to-end functionality of ResultSet compression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-23 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597295#comment-14597295
 ] 

Prasanth Jayachandran commented on HIVE-11043:
--

LGTM, +1. I don't think the test failures are related.

> ORC split strategies should adapt based on number of files
> --
>
> Key: HIVE-11043
> URL: https://issues.apache.org/jira/browse/HIVE-11043
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Gopal V
> Fix For: 2.0.0
>
> Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597289#comment-14597289
 ] 

Hive QA commented on HIVE-11043:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741166/HIVE-11043.2.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9014 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4
org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4345/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4345/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4345/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741166 - PreCommit-HIVE-TRUNK-Build

> ORC split strategies should adapt based on number of files
> --
>
> Key: HIVE-11043
> URL: https://issues.apache.org/jira/browse/HIVE-11043
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Gopal V
> Fix For: 2.0.0
>
> Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]

2015-06-23 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-10844:
-
Attachment: HIVE-10844.3-spark.patch

> Combine equivalent Works for HoS[Spark Branch]
> --
>
> Key: HIVE-10844
> URL: https://issues.apache.org/jira/browse/HIVE-10844
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-10844.1-spark.patch, HIVE-10844.2-spark.patch, 
> HIVE-10844.3-spark.patch
>
>
> Some Hive queries(like [TPCDS 
> Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql])
>  may share the same subquery, which translated into sperate, but equivalent 
> Works in SparkWork, combining these equivalent Works into a single one would 
> help to benifit from following dynamic RDD caching optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)