date:20141210

Review Request 28936: Set completer in CliDriver is not working

2014-12-10 Thread Navis Ryu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28936/
---

Review request for hive.


Bugs: HIVE-9077
https://issues.apache.org/jira/browse/HIVE-9077


Repository: hive-git


Description
---

NO PRECOMMIT TESTS

Seemed broken in HIVE-8609
{noformat}
hive (default)> set Exception in thread "main" java.lang.NullPointerException
at 
jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{noformat}


Diffs
-

  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java aec5018 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2e2bf5a 
  common/src/java/org/apache/hadoop/hive/conf/Validator.java bb0f836 

Diff: https://reviews.apache.org/r/28936/diff/


Testing
---


Thanks,

Navis Ryu

[jira] [Updated] (HIVE-9077) Set completer in CliDriver is not working

2014-12-10 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9077:

Attachment: HIVE-9077.2.patch.txt

> Set completer in CliDriver is not working
> -
>
> Key: HIVE-9077
> URL: https://issues.apache.org/jira/browse/HIVE-9077
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-9077.1.patch.txt, HIVE-9077.2.patch.txt
>
>
> NO PRECOMMIT TESTS
> Seemed broken in HIVE-8609
> {noformat}
> hive (default)> set Exception in thread "main" java.lang.NullPointerException
>   at 
> jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
>   at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
>   at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9078) Hive should not submit second SparkTask while previous one has failed.[Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242270#comment-14242270
 ] 

Chengxiang Li commented on HIVE-9078:
-

We may need to move all spark branch issues as subtask of HIVE-7292 to avoid 
something like this happens again, honestly, it's no so easy to check whether 
duplicate JIRA has opened before.

> Hive should not submit second SparkTask while previous one has failed.[Spark 
> Branch]
> 
>
> Key: HIVE-9078
> URL: https://issues.apache.org/jira/browse/HIVE-9078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> Launching Job 2 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> OK
> Time taken: 68.53 seconds
> {noformat}
> 2 issue in the above CLI output.
> # For a query which would be translated into multi SparkTask, is previous 
> SparkTask failed, Hive should failed right away, the following SparkTask 
> should not be submitted any more.
> # Print failed info in Hive console while query failed.
> The correct CLI output while query failed:
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9078) Hive should not submit second SparkTask while previous one has failed.[Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242266#comment-14242266
 ] 

Chengxiang Li commented on HIVE-9078:
-

Oh, yes, [~csun], Brock bring this issue out today, I'm not aware it's found 
previously yet.

> Hive should not submit second SparkTask while previous one has failed.[Spark 
> Branch]
> 
>
> Key: HIVE-9078
> URL: https://issues.apache.org/jira/browse/HIVE-9078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> Launching Job 2 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> OK
> Time taken: 68.53 seconds
> {noformat}
> 2 issue in the above CLI output.
> # For a query which would be translated into multi SparkTask, is previous 
> SparkTask failed, Hive should failed right away, the following SparkTask 
> should not be submitted any more.
> # Print failed info in Hive console while query failed.
> The correct CLI output while query failed:
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9078) Hive should not submit second SparkTask while previous one has failed.[Spark Branch]

2014-12-10 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242249#comment-14242249
 ] 

Chao commented on HIVE-9078:


[~chengxiang li] Does this make HIVE-9046 duplicate?

> Hive should not submit second SparkTask while previous one has failed.[Spark 
> Branch]
> 
>
> Key: HIVE-9078
> URL: https://issues.apache.org/jira/browse/HIVE-9078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> Launching Job 2 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> OK
> Time taken: 68.53 seconds
> {noformat}
> 2 issue in the above CLI output.
> # For a query which would be translated into multi SparkTask, is previous 
> SparkTask failed, Hive should failed right away, the following SparkTask 
> should not be submitted any more.
> # Print failed info in Hive console while query failed.
> The correct CLI output while query failed:
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9079) Hive hangs while failed to get executorCount[Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9079:

Status: Patch Available  (was: Open)

> Hive hangs while failed to get executorCount[Spark Branch]
> --
>
> Key: HIVE-9079
> URL: https://issues.apache.org/jira/browse/HIVE-9079
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> Hive on Spark get executorCount from RSC to dynamically set reduce number, it 
> use future.get() to wait result, which may hangs forever if remote side 
> failed with no notification. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9079) Hive hangs while failed to get executorCount[Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9079:

Attachment: HIVE-9078.1-spark.patch

Add 5s timeout to wait for executorCount result.

> Hive hangs while failed to get executorCount[Spark Branch]
> --
>
> Key: HIVE-9079
> URL: https://issues.apache.org/jira/browse/HIVE-9079
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> Hive on Spark get executorCount from RSC to dynamically set reduce number, it 
> use future.get() to wait result, which may hangs forever if remote side 
> failed with no notification. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9079) Hive hangs while failed to get executorCount[Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)

Chengxiang Li created HIVE-9079:
---

 Summary: Hive hangs while failed to get executorCount[Spark Branch]
 Key: HIVE-9079
 URL: https://issues.apache.org/jira/browse/HIVE-9079
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li


Hive on Spark get executorCount from RSC to dynamically set reduce number, it 
use future.get() to wait result, which may hangs forever if remote side failed 
with no notification. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9078) Hive should not submit second SparkTask while previous one has failed.[Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9078:

Status: Patch Available  (was: Open)

> Hive should not submit second SparkTask while previous one has failed.[Spark 
> Branch]
> 
>
> Key: HIVE-9078
> URL: https://issues.apache.org/jira/browse/HIVE-9078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> Launching Job 2 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> OK
> Time taken: 68.53 seconds
> {noformat}
> 2 issue in the above CLI output.
> # For a query which would be translated into multi SparkTask, is previous 
> SparkTask failed, Hive should failed right away, the following SparkTask 
> should not be submitted any more.
> # Print failed info in Hive console while query failed.
> The correct CLI output while query failed:
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9078) Hive should not submit second SparkTask while previous one has failed.[Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9078:

Summary: Hive should not submit second SparkTask while previous one has 
failed.[Spark Branch]  (was: Hive should not submit second SparkTask while 
previous one has failed.[Spark Branch]])

> Hive should not submit second SparkTask while previous one has failed.[Spark 
> Branch]
> 
>
> Key: HIVE-9078
> URL: https://issues.apache.org/jira/browse/HIVE-9078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> Launching Job 2 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> OK
> Time taken: 68.53 seconds
> {noformat}
> 2 issue in the above CLI output.
> # For a query which would be translated into multi SparkTask, is previous 
> SparkTask failed, Hive should failed right away, the following SparkTask 
> should not be submitted any more.
> # Print failed info in Hive console while query failed.
> The correct CLI output while query failed:
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 28934: HIVE-9078 Hive should not submit second SparkTask while previous one has failed.[Spark Branch]

2014-12-10 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28934/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-9078
https://issues.apache.org/jira/browse/HIVE-9078


Repository: hive-git


Description
---

For a query which would be translated into multi SparkTask, is previous 
SparkTask failed, Hive should failed right away, the following SparkTask should 
not be submitted any more.
Print failed info in Hive console while query failed.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 8c075b8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
c075c35 

Diff: https://reviews.apache.org/r/28934/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-9078) Hive should not submit second SparkTask while previous one has failed.[Spark Branch]]

2014-12-10 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9078:

Attachment: HIVE-9078.1-spark.patch

> Hive should not submit second SparkTask while previous one has failed.[Spark 
> Branch]]
> -
>
> Key: HIVE-9078
> URL: https://issues.apache.org/jira/browse/HIVE-9078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Attachments: HIVE-9078.1-spark.patch
>
>
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> Launching Job 2 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> OK
> Time taken: 68.53 seconds
> {noformat}
> 2 issue in the above CLI output.
> # For a query which would be translated into multi SparkTask, is previous 
> SparkTask failed, Hive should failed right away, the following SparkTask 
> should not be submitted any more.
> # Print failed info in Hive console while query failed.
> The correct CLI output while query failed:
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9078) Hive should not submit second SparkTask while previous one has failed.[Spark Branch]]

2014-12-10 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9078:

Summary: Hive should not submit second SparkTask while previous one has 
failed.[Spark Branch]]  (was: Hive should not submit second SparkTask is 
previous one has failed.[Spark Branch]])

> Hive should not submit second SparkTask while previous one has failed.[Spark 
> Branch]]
> -
>
> Key: HIVE-9078
> URL: https://issues.apache.org/jira/browse/HIVE-9078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
>
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> Launching Job 2 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> OK
> Time taken: 68.53 seconds
> {noformat}
> 2 issue in the above CLI output.
> # For a query which would be translated into multi SparkTask, is previous 
> SparkTask failed, Hive should failed right away, the following SparkTask 
> should not be submitted any more.
> # Print failed info in Hive console while query failed.
> The correct CLI output while query failed:
> {noformat}
> hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
> customer.c_nationkey limit 10;
> Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
> Total jobs = 2
> Launching Job 1 out of 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Status: Failed
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9078) Hive should not submit second SparkTask is previous one has failed.[Spark Branch]]

2014-12-10 Thread Chengxiang Li (JIRA)

Chengxiang Li created HIVE-9078:
---

 Summary: Hive should not submit second SparkTask is previous one 
has failed.[Spark Branch]]
 Key: HIVE-9078
 URL: https://issues.apache.org/jira/browse/HIVE-9078
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li


{noformat}
hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
customer.c_nationkey limit 10;
Query ID = root_20141211135050_51e5ae15-49a3-4a46-826f-e27ee314ccb2
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Status: Failed
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Status: Failed
OK
Time taken: 68.53 seconds
{noformat}

2 issue in the above CLI output.
# For a query which would be translated into multi SparkTask, is previous 
SparkTask failed, Hive should failed right away, the following SparkTask should 
not be submitted any more.
# Print failed info in Hive console while query failed.

The correct CLI output while query failed:
{noformat}
hive> select n_name, c_name from nation, customer where nation.n_nationkey = 
customer.c_nationkey limit 10;
Query ID = root_20141211142929_ddb7f205-8422-44b4-96bd-96a1c9291895
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Status: Failed
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

2014-12-10 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242208#comment-14242208
 ] 

Hive QA commented on HIVE-8639:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12686445/HIVE-8639.1-spark.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7255 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/515/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/515/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-515/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12686445 - PreCommit-HIVE-SPARK-Build

> Convert SMBJoin to MapJoin [Spark Branch]
> -
>
> Key: HIVE-8639
> URL: https://issues.apache.org/jira/browse/HIVE-8639
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-8639.1-spark.patch
>
>
> HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
> partitioned, there could be a slow down as each mapper would need to get a 
> very small chunk of a partition which has a single key. Thus, in some 
> scenarios it's beneficial to convert SMB join to map join.
> The task is to research and support the conversion from SMB join to map join 
> for Spark execution engine.  See the equivalent of MapReduce in 
> SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8816) Create unit test join of two encrypted tables with different keys

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-8816:
--

Assignee: Ferdinand Xu

> Create unit test join of two encrypted tables with different keys
> -
>
> Key: HIVE-8816
> URL: https://issues.apache.org/jira/browse/HIVE-8816
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>
> The results should be inserted into a third table encrypted with a separate 
> key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8820) Create unit test where we read from a read only unencrypted table

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-8820:
--

Assignee: Ferdinand Xu

> Create unit test where we read from a read only unencrypted table
> -
>
> Key: HIVE-8820
> URL: https://issues.apache.org/jira/browse/HIVE-8820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8940) add createZones and listZones supports in HdfsEncryptionShim for the test purposes

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu resolved HIVE-8940.

Resolution: Invalid

close this jira since resolved in HIVE-8900

> add createZones and listZones supports in HdfsEncryptionShim for the test 
> purposes
> --
>
> Key: HIVE-8940
> URL: https://issues.apache.org/jira/browse/HIVE-8940
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8131) Support timestamp in Avro

2014-12-10 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242197#comment-14242197
 ] 

Ferdinand Xu commented on HIVE-8131:


Hi [~mohitsabharwal], can you please help me review this patch?

> Support timestamp in Avro
> -
>
> Key: HIVE-8131
> URL: https://issues.apache.org/jira/browse/HIVE-8131
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Attachments: HIVE-8131.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 28933: HIVE-8131:Support timestamp in Avro

2014-12-10 Thread cheng xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28933/
---

Review request for hive.


Repository: hive-git


Description
---

The patch includes:
1.add timestamp support for AvroSerde 
2.add related test cases


Diffs
-

  data/files/avro_timestamp.txt PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_timestamp.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_timestamp.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
07c5ecf 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 7639a2b 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java c8eac89 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java 
c84b1a0 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
8cb2dc3 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
cd5a0fa 

Diff: https://reviews.apache.org/r/28933/diff/


Testing
---

Test passed for added cases


Thanks,

cheng xu

[jira] [Updated] (HIVE-8131) Support timestamp in Avro

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8131:
---
Attachment: HIVE-8131.patch

> Support timestamp in Avro
> -
>
> Key: HIVE-8131
> URL: https://issues.apache.org/jira/browse/HIVE-8131
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Attachments: HIVE-8131.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8131) Support timestamp in Avro

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8131:
---
Status: Patch Available  (was: Open)

> Support timestamp in Avro
> -
>
> Key: HIVE-8131
> URL: https://issues.apache.org/jira/browse/HIVE-8131
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Attachments: HIVE-8131.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9066) temporarily disable CBO for non-deterministic functions

2014-12-10 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242175#comment-14242175
 ] 

Laljo John Pullokkaran commented on HIVE-9066:
--

This fix is not needed as HIVE-9035 fixes the original issue.

> temporarily disable CBO for non-deterministic functions
> ---
>
> Key: HIVE-9066
> URL: https://issues.apache.org/jira/browse/HIVE-9066
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.15.0
>
> Attachments: HIVE-9066.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9077) Set completer in CliDriver is not working

2014-12-10 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9077:

Status: Patch Available  (was: Open)

> Set completer in CliDriver is not working
> -
>
> Key: HIVE-9077
> URL: https://issues.apache.org/jira/browse/HIVE-9077
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-9077.1.patch.txt
>
>
> NO PRECOMMIT TESTS
> Seemed broken in HIVE-8609
> {noformat}
> hive (default)> set Exception in thread "main" java.lang.NullPointerException
>   at 
> jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
>   at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
>   at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9077) Set completer in CliDriver is not working

2014-12-10 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9077:

Description: 
NO PRECOMMIT TESTS

Seemed broken in HIVE-8609
{noformat}
hive (default)> set Exception in thread "main" java.lang.NullPointerException
at 
jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{noformat}

  was:
Seemed broken in HIVE-8609
{noformat}
hive (default)> set Exception in thread "main" java.lang.NullPointerException
at 
jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{noformat}


> Set completer in CliDriver is not working
> -
>
> Key: HIVE-9077
> URL: https://issues.apache.org/jira/browse/HIVE-9077
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-9077.1.patch.txt
>
>
> NO PRECOMMIT TESTS
> Seemed broken in HIVE-8609
> {noformat}
> hive (default)> set Exception in thread "main" java.lang.NullPointerException
>   at 
> jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
>   at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
>   at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9060) Fix child operator references after NonBlockingOpDeDupProc

2014-12-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242154#comment-14242154
 ] 

Xuefu Zhang commented on HIVE-9060:
---

+1 pending on tests.

> Fix child operator references after NonBlockingOpDeDupProc
> --
>
> Key: HIVE-9060
> URL: https://issues.apache.org/jira/browse/HIVE-9060
> Project: Hive
>  Issue Type: Bug
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-9060.2.patch, HIVE-9060.3.patch, HIVE-9060.patch
>
>
> The optimizer proc called 'NonBlockingOpDeDupProc' combines Sel-Sel or 
> Fil-Fil into a single operator.  However, some references to the old 
> (removed) child still remain in the optimizer context, and mess up further 
> optimizer procs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9077) Set completer in CliDriver is not working

2014-12-10 Thread Navis (JIRA)

Navis created HIVE-9077:
---

 Summary: Set completer in CliDriver is not working
 Key: HIVE-9077
 URL: https://issues.apache.org/jira/browse/HIVE-9077
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9077.1.patch.txt

Seemed broken in HIVE-8609
{noformat}
hive (default)> set Exception in thread "main" java.lang.NullPointerException
at 
jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
at 
jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9077) Set completer in CliDriver is not working

2014-12-10 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9077:

Attachment: HIVE-9077.1.patch.txt

> Set completer in CliDriver is not working
> -
>
> Key: HIVE-9077
> URL: https://issues.apache.org/jira/browse/HIVE-9077
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-9077.1.patch.txt
>
>
> Seemed broken in HIVE-8609
> {noformat}
> hive (default)> set Exception in thread "main" java.lang.NullPointerException
>   at 
> jline.console.completer.ArgumentCompleter$AbstractArgumentDelimiter.delimit(ArgumentCompleter.java:283)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:116)
>   at 
> jline.console.completer.ArgumentCompleter.complete(ArgumentCompleter.java:152)
>   at org.apache.hadoop.hive.cli.CliDriver$6.complete(CliDriver.java:567)
>   at jline.console.ConsoleReader.complete(ConsoleReader.java:3261)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2621)
>   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:639)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:578)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9060) Fix child operator references after NonBlockingOpDeDupProc

2014-12-10 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9060:

Attachment: HIVE-9060.3.patch

Actually seems precommit tests hasnt started yet.  I'll just make the fix.

> Fix child operator references after NonBlockingOpDeDupProc
> --
>
> Key: HIVE-9060
> URL: https://issues.apache.org/jira/browse/HIVE-9060
> Project: Hive
>  Issue Type: Bug
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-9060.2.patch, HIVE-9060.3.patch, HIVE-9060.patch
>
>
> The optimizer proc called 'NonBlockingOpDeDupProc' combines Sel-Sel or 
> Fil-Fil into a single operator.  However, some references to the old 
> (removed) child still remain in the optimizer context, and mess up further 
> optimizer procs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9060) Fix child operator references after NonBlockingOpDeDupProc

2014-12-10 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242149#comment-14242149
 ] 

Szehon Ho commented on HIVE-9060:
-

Sorry I missed that, that would be good thanks.

> Fix child operator references after NonBlockingOpDeDupProc
> --
>
> Key: HIVE-9060
> URL: https://issues.apache.org/jira/browse/HIVE-9060
> Project: Hive
>  Issue Type: Bug
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-9060.2.patch, HIVE-9060.patch
>
>
> The optimizer proc called 'NonBlockingOpDeDupProc' combines Sel-Sel or 
> Fil-Fil into a single operator.  However, some references to the old 
> (removed) child still remain in the optimizer context, and mess up further 
> optimizer procs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242148#comment-14242148
 ] 

Chao commented on HIVE-8913:


Looking at the log, something interesting..

{noformat}
2014-12-10 05:28:26,822 INFO  [Executor task launch worker-0]: 
exec.GroupByOperator (Operator.java:close(613)) - 44 Close done
2014-12-10 05:28:26,822 INFO  [Executor task launch worker-0]: 
exec.FilterOperator (Operator.java:close(613)) - 43 Close done
2014-12-10 05:28:26,822 INFO  [Executor task launch worker-0]: 
exec.ForwardOperator (Operator.java:close(613)) - 38 Close done
2014-12-10 05:28:26,828 INFO  [Executor task launch worker-1]: 
executor.Executor (Logging.scala:logInfo(59)) - Finished task 2.0 in stage 4.0 
(TID 6). 1170 bytes result sent to driver
2014-12-10 05:28:26,828 INFO  [Executor task launch worker-0]: 
executor.Executor (Logging.scala:logInfo(59)) - Finished task 1.0 in stage 4.0 
(TID 5). 1170 bytes result sent to driver

2014-12-10 05:28:26,979 INFO  [Executor task launch worker-0]: log.PerfLogger 
(PerfLogger.java:PerfLogEnd(135)) - 
2014-12-10 05:28:26,980 INFO  [Executor task launch worker-0]: mr.ObjectCache 
(ObjectCache.java:cache(36)) - Ignoring cache key: __MAP_PLAN__
2014-12-10 05:28:26,982 ERROR [Executor task launch worker-0]: 
executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 
2.0 (TID 2)
java.lang.RuntimeException: Map operator initialization failed
{noformat}

At first, worker-0 is running a reduce work. After finishing that, it is 
switched to this ShuffleMapTask, which apparently doesn't go through 
{{HadoopRDD#compute}}, and thus IOContext is left uninitialized.

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242145#comment-14242145
 ] 

Rui Li commented on HIVE-8913:
--

I think the call path should be {{ExecMapperContext.clear -> IOContext.clear}}? 
And maybe it should be done in reduce record handler as well.

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

2014-12-10 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8639:

Affects Version/s: spark-branch
   Status: Patch Available  (was: Open)

I have a patch for this JIRA.

Instead of making a SMB -> MapJoin path, I introduce a new unitied join 
processor called 'SparkJoinOptimizer' in the logical layer.  This will call the 
SMB or MapJoin optimizers in a certain order depending on the flags that are 
set and which one works.  Thus no need to write a SMB -> MapJoin path.

Two issues so far during this refactoring:
1.  NonBlockingOpDeDupProc optimizer does not update joinContext, making any 
SMB optimizer not able to run after it. Submitted patch in HIVE-9060 which 
should be committed to trunk, but also including it in this spark branch patch.

2.  auto_sortmerge_join_9 failure.  This was passing until yesterday when 
bucket-map join is enabled in HIVE-8638.  As expected, by choosing MapJoins 
over SMB join, it may become a BucketMapJoin.  Some of the more complicated 
queries there get converted to BucketMapJoin and fail.  Can probably file a new 
JIRA to fix this test, as its a BucketMapJoin issue.  Might need the help of 
[~jxiang] on this one.

Exception is below:
{noformat}
2014-12-10 15:31:38,527 WARN  [task-result-getter-3]: scheduler.TaskSetManager 
(Logging.scala:logWarning(71)) - Lost task 1.0 in stage 50.0 (TID 80, 
172.19.8.203): java.lang.RuntimeException: Hive Runtime Error while closing 
operators
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:207)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
at 
org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
... 15 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.plan.BucketMapJoinContext.getMappingBigFile(BucketMapJoinContext.java:187)
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:100)
at 
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:81)
... 21 more
{noformat}

> Convert SMBJoin to MapJoin [Spark Branch]
> -
>
> Key: HIVE-8639
> URL: https://issues.apache.org/jira/browse/HIVE-8639
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-8639.1-spark.patch
>
>
> HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
> partitioned, there could be a slow down as each mapper would need to get a 
> very small chunk of a partition which has a single key. Thus, in some 
> scenarios it's beneficial to convert SMB join to map join.
> The task is to research and support the conversion from SMB join to map join 
> for Spark execution engine.  See the eq

[jira] [Assigned] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

2014-12-10 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-8639:
-

Assignee: Szehon Ho  (was: Chinna Rao Lalam)

> Convert SMBJoin to MapJoin [Spark Branch]
> -
>
> Key: HIVE-8639
> URL: https://issues.apache.org/jira/browse/HIVE-8639
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-8639.1-spark.patch
>
>
> HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
> partitioned, there could be a slow down as each mapper would need to get a 
> very small chunk of a partition which has a single key. Thus, in some 
> scenarios it's beneficial to convert SMB join to map join.
> The task is to research and support the conversion from SMB join to map join 
> for Spark execution engine.  See the equivalent of MapReduce in 
> SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

2014-12-10 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8639:

Attachment: HIVE-8639.1-spark.patch

> Convert SMBJoin to MapJoin [Spark Branch]
> -
>
> Key: HIVE-8639
> URL: https://issues.apache.org/jira/browse/HIVE-8639
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Szehon Ho
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-8639.1-spark.patch
>
>
> HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
> partitioned, there could be a slow down as each mapper would need to get a 
> very small chunk of a partition which has a single key. Thus, in some 
> scenarios it's beneficial to convert SMB join to map join.
> The task is to research and support the conversion from SMB join to map join 
> for Spark execution engine.  See the equivalent of MapReduce in 
> SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 28930: HIVE-8639 : Convert SMBJoin to MapJoin [Spark Branch]

2014-12-10 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28930/
---

Review request for hive.


Bugs: HIVE-8639
https://issues.apache.org/jira/browse/HIVE-8639


Repository: hive-git


Description
---

In MapReduce for auto-SMB joins, SortedMergeJoinProc is run in the earlier 
Optimizer layer to convert join to SMB join, and SortMergeJoinResolver is run 
in later PhysicalOptimizer layer to convert it to MapJoin.  For Spark, we have 
an opportunity to make it cleaner by deciding putting both SMB and MapJoin 
conversions in the logical layer and deciding which one to call.

This patch introduces a new unitied join processor called 'SparkJoinOptimizer' 
in the logical layer.  This will call 'SparkMapJoinOptimizer' and 
'SparkSortMergeJoinOptimizer' in a certain order depending on the flags that 
are set and which ever one is available fails.  Thus no need to write a SMB -> 
MapJoin path.

'SparkSortMergeJoinOptimizer' is a new class that wraps the logic of 
SortedMergeJoinProc but for Spark.  To put both MapJoin/SMB processor in the 
same level, I had to do some fixes.  

1.  One fix is in 'NonBlockingOpDeDupProc', to fix the join context state, as 
now its run before the SMB code that relies on it.  For this I submitted a 
trunk patch at HIVE-9060.
2.  The second fix is that MapReduce's SMB code did two graph walks, one 
processor to calculate all 'rejected' joins, and another processor to change 
the non-rejected ones to SMB join.  That would have made it so we do multiple 
walks, so I refactored the 'rejected' join logic in the same join-operator 
visit in SparkSortMergeJoinOptimizer.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java 
63862b9 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a8a3d86 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 7a716a9 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 24e1460 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 2e35c66 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out b2e928f 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 20ee657 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 0a48d00 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 5008a3f 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out 3b081af 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 2a11fb2 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 0d971d2 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out 9d455dc 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 61eb6ae 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 198d50d 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out f59e57f 

Diff: https://reviews.apache.org/r/28930/diff/


Testing
---

Most of the auto-smb tests give the same output with this change, the only 
difference is now some SMB joins become MapJoin if 
"hive.auto.convert.sortmerge.join.to.mapjoin" is on, as expected.

One failing test is auto_sortmerge_join_9.  This was passing until yesterday 
when bucket-map join is enabled in HIVE-8638.  As expected, by choosing 
MapJoins over SMB join if "hive.auto.convert.sortmerge.join.to.mapjoin" is on, 
the MapJoin may become a bucket-mapjoin.  Some of the more complicated queries 
of auto_sortmerge_join_9 get converted to bucket mapjoin and fail.  Can 
probably file a new JIRA to fix this test.


Thanks,

Szehon Ho

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242133#comment-14242133
 ] 

Navis commented on HIVE-9076:
-

Test always makes file composition like below,
{noformat}
00_0
00_0_copy_1
00_0_copy_2
00_0_copy_3
00_0_copy_4
00_0_copy_5
{noformat}
The composition in description is caused by a bug which I've made (I'm in 
progress on HIVE-8814). But I think this should be fixed first.

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-9076.1.patch.txt
>
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242125#comment-14242125
 ] 

Prasanth Jayachandran commented on HIVE-9076:
-

[~navis] I am wondering how test cases doesn't show the missing file. Any idea? 
dfs -ls in test case still shows 4 files which is expected. 

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-9076.1.patch.txt
>
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9060) Fix child operator references after NonBlockingOpDeDupProc

2014-12-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242124#comment-14242124
 ] 

Xuefu Zhang commented on HIVE-9060:
---

It seems there is still a duplication of 
{code}qbJoinTree.getAliasToOpInfo(){code}, but I think I can fix that when I 
commit it.

> Fix child operator references after NonBlockingOpDeDupProc
> --
>
> Key: HIVE-9060
> URL: https://issues.apache.org/jira/browse/HIVE-9060
> Project: Hive
>  Issue Type: Bug
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-9060.2.patch, HIVE-9060.patch
>
>
> The optimizer proc called 'NonBlockingOpDeDupProc' combines Sel-Sel or 
> Fil-Fil into a single operator.  However, some references to the old 
> (removed) child still remain in the optimizer context, and mess up further 
> optimizer procs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2014-12-10 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8993:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Chengxiang.

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Fix For: spark-branch
>
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
> HIVE-8993.3-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9060) Fix child operator references after NonBlockingOpDeDupProc

2014-12-10 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9060:

Attachment: HIVE-9060.2.patch

Addressing review comments.

> Fix child operator references after NonBlockingOpDeDupProc
> --
>
> Key: HIVE-9060
> URL: https://issues.apache.org/jira/browse/HIVE-9060
> Project: Hive
>  Issue Type: Bug
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-9060.2.patch, HIVE-9060.patch
>
>
> The optimizer proc called 'NonBlockingOpDeDupProc' combines Sel-Sel or 
> Fil-Fil into a single operator.  However, some references to the old 
> (removed) child still remain in the optimizer context, and mess up further 
> optimizer procs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9025:
---
Status: Patch Available  (was: Open)

> join38.q (without map join) produces incorrect result when testing with 
> multiple reducers
> -
>
> Key: HIVE-9025
> URL: https://issues.apache.org/jira/browse/HIVE-9025
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0
>Reporter: Chao
>Assignee: Ted Xu
>Priority: Blocker
> Attachments: HIVE-9025.1.patch, HIVE-9025.patch
>
>
> I have this query from a modified version of {{join38.q}}, which does NOT use 
> map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set 
> it to be a larger number (3 for instance), then result will be 
> {noformat}
> val_111   105 1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will 
> overwrite the partition cols for the reduce sink desc, with an empty list. 
> Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is 
> length 0, it will use an random number as hashcode, for each separate row. As 
> result, rows with same key will be distributed to different reducers, and 
> hence leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9025:
---
Status: Open  (was: Patch Available)

> join38.q (without map join) produces incorrect result when testing with 
> multiple reducers
> -
>
> Key: HIVE-9025
> URL: https://issues.apache.org/jira/browse/HIVE-9025
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0
>Reporter: Chao
>Assignee: Ted Xu
>Priority: Blocker
> Attachments: HIVE-9025.1.patch, HIVE-9025.patch
>
>
> I have this query from a modified version of {{join38.q}}, which does NOT use 
> map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set 
> it to be a larger number (3 for instance), then result will be 
> {noformat}
> val_111   105 1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will 
> overwrite the partition cols for the reduce sink desc, with an empty list. 
> Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is 
> length 0, it will use an random number as hashcode, for each separate row. As 
> result, rows with same key will be distributed to different reducers, and 
> hence leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2014-12-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242117#comment-14242117
 ] 

Xuefu Zhang commented on HIVE-8993:
---

The test failures are concerning, but I don't think they are related to this 
patch. Will commit this patch shortly.

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
> HIVE-8993.3-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-12-10 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242111#comment-14242111
 ] 

Sergey Shelukhin commented on HIVE-5538:


We should be thinking about eliminating some q files, cause Hive tests already 
take forever and testing the zoo of outdated configurations on all tests would 
make it even worse.

> Turn on vectorization by default.
> -
>
> Key: HIVE-5538
> URL: https://issues.apache.org/jira/browse/HIVE-5538
> Project: Hive
>  Issue Type: Task
>Reporter: Jitendra Nath Pandey
>Assignee: Matt McCline
> Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
> HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch, 
> HIVE-5538.61.patch, HIVE-5538.62.patch
>
>
>   Vectorization should be turned on by default, so that users don't have to 
> specifically enable vectorization. 
>   Vectorization code validates and ensures that a query falls back to row 
> mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2014-12-10 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242108#comment-14242108
 ] 

Hive QA commented on HIVE-8993:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12686425/HIVE-8993.3-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7255 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/514/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/514/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-514/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12686425 - PreCommit-HIVE-SPARK-Build

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
> HIVE-8993.3-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9067) OrcFileMergeOperator may create merge file that does not match properties of input files

2014-12-10 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242107#comment-14242107
 ] 

Sergey Shelukhin commented on HIVE-9067:


+1 if tests pass

> OrcFileMergeOperator may create merge file that does not match properties of 
> input files
> 
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: Orc
> Attachments: HIVE-9067.1.patch
>
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242104#comment-14242104
 ] 

Xuefu Zhang commented on HIVE-8913:
---

{quote}
Just quick thought, maybe IOContext.inputNameIOContextMap should be a 
concurrent hash map?
{quote}
This seems reasonable to try at least.

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242100#comment-14242100
 ] 

Xuefu Zhang commented on HIVE-8913:
---

Yes, it's possible, but we already have that in SparkMapRecordHandler.java#210.

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9055) Tez: union all followed by group by followed by another union all gives error

2014-12-10 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-9055:
-
Attachment: HIVE-9055.WIP.patch

> Tez: union all followed by group by followed by another union all gives error
> -
>
> Key: HIVE-9055
> URL: https://issues.apache.org/jira/browse/HIVE-9055
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Vikram Dixit K
> Attachments: HIVE-9055.WIP.patch
>
>
> Here is the way to produce it:
> in Hive q test setting (with src table)
> set hive.execution.engine=tez;
> select key from 
> (
> select key from src
> union all 
> select key from src
> ) tab group by key
> union all
> select key from src;
> will give you
> ERROR
> 2014-12-09 11:38:48,316 ERROR ql.Driver (SessionState.java:printError(834)) - 
> FAILED: IndexOutOfBoundsException Index: -1, Size: 1
> java.lang.IndexOutOfBoundsException: Index: -1, Size: 1
> at java.util.LinkedList.checkElementIndex(LinkedList.java:553)
> at java.util.LinkedList.get(LinkedList.java:474)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:354)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
> at 
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
> at 
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez(TestMiniTezCliDriver.java:120)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> btw: there is not problem when it is run with MR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242095#comment-14242095
 ] 

Prasanth Jayachandran commented on HIVE-9076:
-

Thanks [~navis] for the logs and the patch! The patch mostly looks good except 
for the main() method in Utilities.java. +1 pending tests.

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-9076.1.patch.txt
>
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9035) CBO: Disable PPD when functions are non-deterministic (ppd_random.q - non-deterministic udf rand() pushed above join)

2014-12-10 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242094#comment-14242094
 ] 

Sergey Shelukhin commented on HIVE-9035:


can you post RB?

> CBO: Disable PPD when functions are non-deterministic (ppd_random.q  - 
> non-deterministic udf rand() pushed above join)
> --
>
> Key: HIVE-9035
> URL: https://issues.apache.org/jira/browse/HIVE-9035
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Laljo John Pullokkaran
> Fix For: 0.15.0
>
> Attachments: HIVE-9035.patch
>
>
> Not clear if this is a problem. Probably issue is in Optiq if it is, does it 
> know the UDF is non-deterministic? We could disable Optiq for such UDFs if 
> all else fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9055) Tez: union all followed by group by followed by another union all gives error

2014-12-10 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-9055:
-
Status: Patch Available  (was: Open)

> Tez: union all followed by group by followed by another union all gives error
> -
>
> Key: HIVE-9055
> URL: https://issues.apache.org/jira/browse/HIVE-9055
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Vikram Dixit K
> Attachments: HIVE-9055.WIP.patch
>
>
> Here is the way to produce it:
> in Hive q test setting (with src table)
> set hive.execution.engine=tez;
> select key from 
> (
> select key from src
> union all 
> select key from src
> ) tab group by key
> union all
> select key from src;
> will give you
> ERROR
> 2014-12-09 11:38:48,316 ERROR ql.Driver (SessionState.java:printError(834)) - 
> FAILED: IndexOutOfBoundsException Index: -1, Size: 1
> java.lang.IndexOutOfBoundsException: Index: -1, Size: 1
> at java.util.LinkedList.checkElementIndex(LinkedList.java:553)
> at java.util.LinkedList.get(LinkedList.java:474)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:354)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:87)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:103)
> at 
> org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:69)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:834)
> at 
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:136)
> at 
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_uniontez(TestMiniTezCliDriver.java:120)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> btw: there is not problem when it is run with MR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242087#comment-14242087
 ] 

Rui Li commented on HIVE-8913:
--

Another possible issue is that we have to make sure the threadlocal is properly 
cleared before the thread is returned to thread pool.

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9076:

Status: Patch Available  (was: Open)

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-9076.1.patch.txt
>
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9076:

Attachment: HIVE-9076.1.patch.txt

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-9076.1.patch.txt
>
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242082#comment-14242082
 ] 

Navis commented on HIVE-9076:
-

Result
{noformat}
hive (default)> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orc_merge5b/;
Found 3 items
-rwxr-xr-x   1 navis supergroup   1203 2014-12-11 12:42 
/user/hive/warehouse/orc_merge5b/00_0
-rwxr-xr-x   1 navis supergroup600 2014-12-11 12:33 
/user/hive/warehouse/orc_merge5b/00_1_copy1
-rwxr-xr-x   1 navis supergroup600 2014-12-11 12:33 
/user/hive/warehouse/orc_merge5b/00_1_copy2
{noformat}

log
{noformat}
2014-12-11 12:42:14,843 INFO  exec.Task (SessionState.java:printInfo(825)) - 
Starting Job = job_1418171849140_0008, Tracking URL = 
http://localhost:8088/proxy/application_1418171849140_0008/
2014-12-11 12:42:14,843 INFO  exec.Task (SessionState.java:printInfo(825)) - 
Kill Command = /home/navis/hadoop-0.20/bin/hadoop job  -kill 
job_1418171849140_0008
2014-12-11 12:42:17,010 INFO  exec.Task (SessionState.java:printInfo(825)) - 
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-12-11 12:42:17,064 WARN  mapreduce.Counters 
(AbstractCounters.java:getGroup(234)) - Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-12-11 12:42:17,065 INFO  exec.Task (SessionState.java:printInfo(825)) - 
2014-12-11 12:42:17,060 null map = 0%,  reduce = 0%
2014-12-11 12:42:21,200 WARN  mapreduce.Counters 
(AbstractCounters.java:getGroup(234)) - Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-12-11 12:42:21,201 INFO  exec.Task (SessionState.java:printInfo(825)) - 
2014-12-11 12:42:21,200 null map = 100%,  reduce = 0%
2014-12-11 12:42:21,206 WARN  mapreduce.Counters 
(AbstractCounters.java:getGroup(234)) - Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-12-11 12:42:21,210 INFO  exec.Task (SessionState.java:printInfo(825)) - 
Ended Job = job_1418171849140_0008
2014-12-11 12:42:21,224 WARN  exec.Utilities 
(Utilities.java:removeTempOrDuplicateFiles(1977)) - Duplicate taskid file 
removed: 
hdfs://localhost:9000/tmp/hive/navis/80443669-b835-4a09-a73f-8b783d104f61/hive_2014-12-11_12-42-12_963_1945093075009681620-1/_tmp.-ext-1/00_1
 with length 600. Existing file: 
hdfs://localhost:9000/tmp/hive/navis/80443669-b835-4a09-a73f-8b783d104f61/hive_2014-12-11_12-42-12_963_1945093075009681620-1/_tmp.-ext-1/00_0
 with length 1203
2014-12-11 12:42:21,225 INFO  exec.AbstractFileMergeOperator 
(Utilities.java:mvFileToFinalPath(1805)) - Moving tmp dir: 
hdfs://localhost:9000/tmp/hive/navis/80443669-b835-4a09-a73f-8b783d104f61/hive_2014-12-11_12-42-12_963_1945093075009681620-1/_tmp.-ext-1
 to: 
hdfs://localhost:9000/tmp/hive/navis/80443669-b835-4a09-a73f-8b783d104f61/hive_2014-12-11_12-42-12_963_1945093075009681620-1/-ext-1
2014-12-11 12:42:21,233 INFO  exec.AbstractFileMergeOperator 
(AbstractFileMergeOperator.java:jobCloseOp(247)) - jobCloseOp moved merged 
files to output dir: 
hdfs://localhost:9000/tmp/hive/navis/80443669-b835-4a09-a73f-8b783d104f61/hive_2014-12-11_12-42-12_963_1945093075009681620-1/-ext-1
{noformat}

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242076#comment-14242076
 ] 

Chao commented on HIVE-8913:


(cc [~jxiang]) Not sure - maybe we can try that.
The interesting thing is: the copying of IOContext is supposed to work only in 
caching RDD case, but this test doesn't even have caching, so in 
{{SparkMapRecordHandler}}, the input path should not be null in the first 
place..


> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2014-12-10 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242050#comment-14242050
 ] 

Brock Noland commented on HIVE-8993:


+1 pending tests

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
> HIVE-8993.3-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242045#comment-14242045
 ] 

Prasanth Jayachandran commented on HIVE-9076:
-

[~navis] Log file should tell what files are renamed, moved and merged. Would 
it be possible to get the logs from OrcFileMergeOperator, 
AbstractFileMergeOperator and MergeFileMapper?

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28894: HIVE-8993 Make sure HoS works with HS2[Spark Branch]

2014-12-10 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28894/
---

(Updated Dec. 11, 2014, 2:49 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Move spark download script to itests/pom.xml, which would be used by both 
hive-unit and qtest-spark, so that we only need to maintain on script copy.


Bugs: HIVE-8993
https://issues.apache.org/jira/browse/HIVE-8993


Repository: hive-git


Description
---

Enable a unit test which would verify Hive on Spark works well with HiveServer2.


Diffs (updated)
-

  itests/.gitignore PRE-CREATION 
  itests/hive-unit/pom.xml f9f59c9 
  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithLocalClusterSpark.java
 PRE-CREATION 
  itests/pom.xml 53f6c98 
  itests/qtest-spark/.gitignore c2ed135 
  itests/qtest-spark/pom.xml 1b02be5 
  pom.xml 5d03641 

Diff: https://reviews.apache.org/r/28894/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8993:

Attachment: HIVE-8993.3-spark.patch

Move spark download script to itests/pom.xml, which would be used by both 
hive-unit and qtest-spark, so that we only need to maintain on script copy.

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
> HIVE-8993.3-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242044#comment-14242044
 ] 

Prasanth Jayachandran commented on HIVE-9076:
-

What is the resultant file set after merging?
I would expect either
{code}
00_0 (v12) (all v12 files merged together)
00_1 (v11)
00_1_copy_1 (v11)
00_1_copy_2 (v11)
{code}
or
{code}
00_0 (v11) (all v11 files merged together)
00_0_copy_1 (v12) (this is actual 00_0 file but renamed to 
00_0_copy_1 as merge output file has same name)
00_0_copy_2 (v12) (this is actual 00_0_copy_1 file but renamed to 
00_0_copy_2 as 00_0_copy_1 exists)
00_2 (v12)
{code}

Different order will also be possible as incompatFileSet iteration order might 
be different.

> incompatFileSet in AbstractFileMergeOperator should be marked to skip task id 
> check
> ---
>
> Key: HIVE-9076
> URL: https://issues.apache.org/jira/browse/HIVE-9076
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>
> In some file composition, AbstractFileMergeOperator removes incompatible 
> files. For example,
> {noformat}
> 00_0 (v12)
> 00_0_copy_1 (v12)
> 00_1 (v11)
> 00_1_copy_1 (v11)
> 00_1_copy_2 (v11)
> 00_2 (v12)
> {noformat}
> 00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28372: HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file

2014-12-10 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28372/#review64671
---


This looks great!! I have a few comments below..


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetSchemaReader.java


We should add a message such as 

"Error reading footer from: " + parquetFilePath



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java


Isn't this a timestamp?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java


pls comment on why we are not hanlding MAP_KEY_VALUE



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java


Why aren't we handling UTF8?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java


constants should be final



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java


LOG shoudl be at the top of this clazz



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java


this should be error



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java


We should not printStackTrace. Please make the log statement:

LOG.error("Unable to read file from " + loc + ": " + e, e);



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java


I don't think this is warnable. It's legal to specifify a location for an 
internal table.


- Brock Noland


On Dec. 10, 2014, 8:34 p.m., Ashish Singh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28372/
> ---
> 
> (Updated Dec. 10, 2014, 8:34 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8950
> https://issues.apache.org/jira/browse/HIVE-8950
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-8950: Add support in ParquetHiveSerde to create table schema from a 
> parquet file
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> fafd78e63e9b41c9fdb0e017b567dc719d151784 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
> 32186391e7e4cfc9b4d06d7376663e82ec08d9e6 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> b137fcb86e27ac91ed3c733b4d8788228d379a09 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> 2db2658fbc57fba01c892c9213baef6c498e659b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetSchemaReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 4effe736fcf9d3715f03eed9885c299a7aa040dd 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> cd3d349e8bd8785d6cadaf9ed8fa7598f223774a 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q 
> PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema_ext.q 
> PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q 
> PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q
>  PRE-CREATION 
>   ql/src/test/results/cl

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242035#comment-14242035
 ] 

Rui Li commented on HIVE-8913:
--

Just quick thought, maybe {{IOContext.inputNameIOContextMap}} should be a 
concurrent hash map?

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242026#comment-14242026
 ] 

Chao commented on HIVE-8913:


Looks like there's still some concurrency issue related to IOContext:

{noformat}
java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:136)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:54)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:29)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:167)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:167)
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.IOContext.copy(IOContext.java:119)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:97)
... 16 more
{noformat}

I suspects they are the same issue as tracked by HIVE-8578.

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9076) incompatFileSet in AbstractFileMergeOperator should be marked to skip task id check

2014-12-10 Thread Navis (JIRA)

Navis created HIVE-9076:
---

 Summary: incompatFileSet in AbstractFileMergeOperator should be 
marked to skip task id check
 Key: HIVE-9076
 URL: https://issues.apache.org/jira/browse/HIVE-9076
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor


In some file composition, AbstractFileMergeOperator removes incompatible files. 
For example,
{noformat}
00_0 (v12)
00_0_copy_1 (v12)
00_1 (v11)
00_1_copy_1 (v11)
00_1_copy_2 (v11)
00_2 (v12)
{noformat}

00_1 (v11) will be removed because 00 is assigned to new merged file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-12-10 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242020#comment-14242020
 ] 

Rui Li commented on HIVE-8913:
--

I think the failure is not related because it passes on my machine and it 
doesn't even involve join.

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8900) Create encryption testing framework

2014-12-10 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8900:
---
   Resolution: Fixed
Fix Version/s: encryption-branch
   Status: Resolved  (was: Patch Available)

> Create encryption testing framework
> ---
>
> Key: HIVE-8900
> URL: https://issues.apache.org/jira/browse/HIVE-8900
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Fix For: encryption-branch
>
> Attachments: HIVE-8065.1.patch, HIVE-8065.patch, HIVE-8900.1.patch
>
>
> As [mentioned by 
> Alan|https://issues.apache.org/jira/browse/HIVE-8821?focusedCommentId=14215318&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215318]
>  we already have some q-file tests which fit our needs.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8900) Create encryption testing framework

2014-12-10 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242013#comment-14242013
 ] 

Brock Noland commented on HIVE-8900:


+1

Thank you [~Ferd] and [~spena]! I have committed this to branch.

> Create encryption testing framework
> ---
>
> Key: HIVE-8900
> URL: https://issues.apache.org/jira/browse/HIVE-8900
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Attachments: HIVE-8065.1.patch, HIVE-8065.patch, HIVE-8900.1.patch
>
>
> As [mentioned by 
> Alan|https://issues.apache.org/jira/browse/HIVE-8821?focusedCommentId=14215318&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215318]
>  we already have some q-file tests which fit our needs.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9075) Allow RPC Configuration [Spark Branch]

2014-12-10 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9075:
---
Affects Version/s: spark-branch
  Summary: Allow RPC Configuration [Spark Branch]  (was: Allow RPC 
Configuration)

> Allow RPC Configuration [Spark Branch]
> --
>
> Key: HIVE-9075
> URL: https://issues.apache.org/jira/browse/HIVE-9075
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: spark-branch
>Reporter: Brock Noland
>
> [~vanzin] has a bunch of nice config properties in RpcConfiguration:
> https://github.com/apache/hive/blob/spark/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java#L68
> However, we only load config properties which start with spark:
> https://github.com/apache/hive/blob/spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L102
> thus it's not possible to set this one the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9075) Allow RPC Configuration

2014-12-10 Thread Brock Noland (JIRA)

Brock Noland created HIVE-9075:
--

 Summary: Allow RPC Configuration
 Key: HIVE-9075
 URL: https://issues.apache.org/jira/browse/HIVE-9075
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


[~vanzin] has a bunch of nice config properties in RpcConfiguration:

https://github.com/apache/hive/blob/spark/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java#L68

However, we only load config properties which start with spark:

https://github.com/apache/hive/blob/spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L102

thus it's not possible to set this one the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2014-12-10 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242011#comment-14242011
 ] 

Brock Noland commented on HIVE-8993:


That sounds good!

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242006#comment-14242006
 ] 

Vikram Dixit K commented on HIVE-9025:
--

+1 for 0.14

> join38.q (without map join) produces incorrect result when testing with 
> multiple reducers
> -
>
> Key: HIVE-9025
> URL: https://issues.apache.org/jira/browse/HIVE-9025
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0
>Reporter: Chao
>Assignee: Ted Xu
>Priority: Blocker
> Attachments: HIVE-9025.1.patch, HIVE-9025.patch
>
>
> I have this query from a modified version of {{join38.q}}, which does NOT use 
> map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set 
> it to be a larger number (3 for instance), then result will be 
> {noformat}
> val_111   105 1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will 
> overwrite the partition cols for the reduce sink desc, with an empty list. 
> Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is 
> length 0, it will use an random number as hashcode, for each separate row. As 
> result, rows with same key will be distributed to different reducers, and 
> hence leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-10 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9001:
-
Labels:   (was: TODOC15)

> Ship with log4j.properties file that has a reliable time based rolling policy
> -
>
> Key: HIVE-9001
> URL: https://issues.apache.org/jira/browse/HIVE-9001
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 0.15.0
>
> Attachments: HIVE-9001.1.patch
>
>
> The hive log gets locked by the hive process and cannot be rolled in windows 
> OS.
> Install Hive in  Windows, start hive, try and rename hive log while Hive is 
> running. 
> Wait for log4j tries to rename it and it will throw the same error as it is 
> locked by the process.
> The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
> should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-10 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241995#comment-14241995
 ] 

Lefty Leverenz commented on HIVE-9001:
--

Yes, you can remove TODOC15 as soon as it's documented.  We don't use a 
DOC-DONE label, but comments show that the documentation was done.

When I do the doc, sometimes I leave TODOC## pending review (unless I'm sure of 
the information).  But there's no need for developers to wait for review once 
the information is in the doc.

> Ship with log4j.properties file that has a reliable time based rolling policy
> -
>
> Key: HIVE-9001
> URL: https://issues.apache.org/jira/browse/HIVE-9001
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>  Labels: TODOC15
> Fix For: 0.15.0
>
> Attachments: HIVE-9001.1.patch
>
>
> The hive log gets locked by the hive process and cannot be rolled in windows 
> OS.
> Install Hive in  Windows, start hive, try and rename hive log while Hive is 
> running. 
> Wait for log4j tries to rename it and it will throw the same error as it is 
> locked by the process.
> The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
> should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9019) Avoid using SPARK_JAVA_OPTS [Spark Branch]

2014-12-10 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241993#comment-14241993
 ] 

Rui Li commented on HIVE-9019:
--

OK... let me know if you find other problems with this :-)

> Avoid using SPARK_JAVA_OPTS [Spark Branch]
> --
>
> Key: HIVE-9019
> URL: https://issues.apache.org/jira/browse/HIVE-9019
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-9019.1-spark.patch, HIVE-9019.1-spark.patch
>
>
> SPARK_JAVA_OPTS has been deprecated, see {{SparkConf.validateSettings}}.
> Using it together with {{spark.driver.extraJavaOptions}} will cause 
> SparkContext fail to start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9062) Explain plan doesn't print join keys for Tez shuffle join

2014-12-10 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241987#comment-14241987
 ] 

Vikram Dixit K commented on HIVE-9062:
--

LGTM +1. +1 for 0.14 as well.

> Explain plan doesn't print join keys for Tez shuffle join
> -
>
> Key: HIVE-9062
> URL: https://issues.apache.org/jira/browse/HIVE-9062
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability, Tez
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-9062.patch
>
>
> For map join, it already prints the keys, but not for shuffle join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9063) NPE in RemoteSparkJobStatus.getSparkStatistics [Spark Branch]

2014-12-10 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241984#comment-14241984
 ] 

Rui Li commented on HIVE-9063:
--

[~xuefuz] - that seems to be an auto optimize of IntelliJ. I'll disable that :-)

> NPE in RemoteSparkJobStatus.getSparkStatistics [Spark Branch]
> -
>
> Key: HIVE-9063
> URL: https://issues.apache.org/jira/browse/HIVE-9063
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-9063.1-spark.patch
>
>
> SparkCounters may be null and NPE is thrown:
> {noformat}
> 2014-12-09 22:37:15,268 ERROR [HiveServer2-Background-Pool: Thread-44]: 
> exec.Task (SparkTask.java:execute(119)) - Failed to execute spark task.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.Statistic.SparkStatisticsBuilder.add(SparkStatisticsBuilder.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStatistics(RemoteSparkJobStatus.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:110)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1646)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1406)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1218)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1040)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28791: HIVE-9025 join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28791/#review64666
---

Ship it!


Ship It!

- Chao Sun


On Dec. 11, 2014, 1:36 a.m., Ted Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28791/
> ---
> 
> (Updated Dec. 11, 2014, 1:36 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Chao Sun.
> 
> 
> Bugs: HIVE-9025
> https://issues.apache.org/jira/browse/HIVE-9025
> 
> 
> Repository: hive
> 
> 
> Description
> ---
> 
> HIVE-5771 introduced a bug that when all partition columns are constants, the 
> partition is transformed to be a random  dispatch, which is not expected.
> 
> This patch adds a constant column in the above case to avoid random 
> partitioning.
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/hive/trunk/itests/src/test/resources/testconfiguration.properties
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/constprog_partitioner.q
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog2.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog_partitioner.q.out
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join_nullsafe.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join4.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/quotedid_basic.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_25.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning_2.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/join_nullsafe.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vector_decimal_mapjoin.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vectorized_dynamic_partition_pruning.q.out
>  1644497 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union27.q.out
>  1644497 
> 
> Diff: https://reviews.apache.org/r/28791/diff/
> 
> 
> Testing
> ---
> 
> TestCliDriver passed.
> 
> 
> Thanks,
> 
> Ted Xu
> 
>

[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms

2014-12-10 Thread Xiaomeng Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8049:
-
Status: Patch Available  (was: In Progress)

> Transparent column level encryption using kms
> -
>
> Key: HIVE-8049
> URL: https://issues.apache.org/jira/browse/HIVE-8049
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Xiaomeng Huang
>Assignee: Xiaomeng Huang
> Attachments: HIVE-8049.001.patch, HIVE-8049.002.patch
>
>
> This patch implement transparent column level encryption. Users don't need to 
> set anything when they quey tables.
> # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
> key)
> {code}
>  
> hadoop.kms.acl.GET
> user1 root
> 
>   ACL for get-key-version and get-current-key operations.
> 
>   
> {code}
> # set hive-site.xml 
> {code}
>
> hadoop.security.key.provider.path  
> kms://http@localhost:16000/kms  
>   
> {code}
> # create an encrypted table
> {code}
> drop table student_column_encrypt;
> create table student_column_encrypt (s_key INT, s_name STRING, s_country 
> STRING, s_age INT) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 
> 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter')
>  
>   STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
> insert overwrite table student_column_encrypt 
> select 
>   s_key, s_name, s_country, s_age
> from student;
>  
> select * from student_column_encrypt; 
> {code}
> # query table by different user, this is transparent to users. It is very 
> convenient and don't need to set anything.
> {code}
> [root@huang1 hive_data]# hive
> hive> select * from student_column_encrypt;   
> OK
> 0 Armon   China   20
> 1 JackUSA 21
> 2 LucyEngland 22
> 3 LilyFrance  23
> 4 Yom Spain   24
> Time taken: 0.759 seconds, Fetched: 5 row(s)
> [root@huang1 hive_data]# su user2
> [user2@huang1 hive_data]$ hive
> hive> select * from student_column_encrypt;
> OK
> 0 Armon   dqyb188=NULL
> 1 JackYJez   NULL
> 2 LucycKqV1c8MTw==NULL
> 3 Lilyc7aT180H   NULL
> 4 Yom ZrST0MA=NULL
> Time taken: 0.77 seconds, Fetched: 5 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8969) getChildPrivileges should in one transaction with revoke

2014-12-10 Thread Xiaomeng Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang resolved HIVE-8969.
--
Resolution: Invalid

Sorry, I want to create this jira in Sentry project, but I made a mistake in 
Hive.
Close this jira as Invalid.

> getChildPrivileges should in one transaction with revoke
> 
>
> Key: HIVE-8969
> URL: https://issues.apache.org/jira/browse/HIVE-8969
> Project: Hive
>  Issue Type: Bug
>Reporter: Xiaomeng Huang
>Assignee: Xiaomeng Huang
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8900) Create encryption testing framework

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8900:
---
Description: 
As [mentioned by 
Alan|https://issues.apache.org/jira/browse/HIVE-8821?focusedCommentId=14215318&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215318]
 we already have some q-file tests which fit our needs.
NO PRECOMMIT TESTS

  was:As [mentioned by 
Alan|https://issues.apache.org/jira/browse/HIVE-8821?focusedCommentId=14215318&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215318]
 we already have some q-file tests which fit our needs.


> Create encryption testing framework
> ---
>
> Key: HIVE-8900
> URL: https://issues.apache.org/jira/browse/HIVE-8900
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Attachments: HIVE-8065.1.patch, HIVE-8065.patch, HIVE-8900.1.patch
>
>
> As [mentioned by 
> Alan|https://issues.apache.org/jira/browse/HIVE-8821?focusedCommentId=14215318&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215318]
>  we already have some q-file tests which fit our needs.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2014-12-10 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241977#comment-14241977
 ] 

Chengxiang Li commented on HIVE-8993:
-

Should we put download spark logic in itests pom, as we only need to download 
spark for unit test.

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28791: HIVE-9025 join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Ted Xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28791/
---

(Updated 十二月 11, 2014, 1:36 a.m.)


Review request for hive, Ashutosh Chauhan and Chao Sun.


Changes
---

Format: removed trailing spaces.


Bugs: HIVE-9025
https://issues.apache.org/jira/browse/HIVE-9025


Repository: hive


Description
---

HIVE-5771 introduced a bug that when all partition columns are constants, the 
partition is transformed to be a random  dispatch, which is not expected.

This patch adds a constant column in the above case to avoid random 
partitioning.


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hive/trunk/itests/src/test/resources/testconfiguration.properties
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/constprog_partitioner.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog2.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog_partitioner.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join_nullsafe.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join4.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/quotedid_basic.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_25.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning_2.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/join_nullsafe.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vector_decimal_mapjoin.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vectorized_dynamic_partition_pruning.q.out
 1644497 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union27.q.out
 1644497 

Diff: https://reviews.apache.org/r/28791/diff/


Testing
---

TestCliDriver passed.


Thanks,

Ted Xu

[jira] [Updated] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Ted Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-9025:
-
Attachment: HIVE-9025.1.patch

> join38.q (without map join) produces incorrect result when testing with 
> multiple reducers
> -
>
> Key: HIVE-9025
> URL: https://issues.apache.org/jira/browse/HIVE-9025
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0
>Reporter: Chao
>Assignee: Ted Xu
>Priority: Blocker
> Attachments: HIVE-9025.1.patch, HIVE-9025.patch
>
>
> I have this query from a modified version of {{join38.q}}, which does NOT use 
> map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set 
> it to be a larger number (3 for instance), then result will be 
> {noformat}
> val_111   105 1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will 
> overwrite the partition cols for the reduce sink desc, with an empty list. 
> Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is 
> length 0, it will use an random number as hashcode, for each separate row. As 
> result, rows with same key will be distributed to different reducers, and 
> hence leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8932) Enable n-way join on CBO

2014-12-10 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-8932:


Assignee: Laljo John Pullokkaran

> Enable n-way join on CBO 
> -
>
> Key: HIVE-8932
> URL: https://issues.apache.org/jira/browse/HIVE-8932
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Laljo John Pullokkaran
>
> In some cases (not all ways) with CBO, we don't generate n-way join, when we 
> could.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8900) Create encryption testing framework

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8900:
---
Attachment: HIVE-8900.1.patch

> Create encryption testing framework
> ---
>
> Key: HIVE-8900
> URL: https://issues.apache.org/jira/browse/HIVE-8900
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Attachments: HIVE-8065.1.patch, HIVE-8065.patch, HIVE-8900.1.patch
>
>
> As [mentioned by 
> Alan|https://issues.apache.org/jira/browse/HIVE-8821?focusedCommentId=14215318&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215318]
>  we already have some q-file tests which fit our needs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9035) CBO: Disable PPD when functions are non-deterministic (ppd_random.q - non-deterministic udf rand() pushed above join)

2014-12-10 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-9035:
-
Status: Patch Available  (was: Open)

> CBO: Disable PPD when functions are non-deterministic (ppd_random.q  - 
> non-deterministic udf rand() pushed above join)
> --
>
> Key: HIVE-9035
> URL: https://issues.apache.org/jira/browse/HIVE-9035
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Laljo John Pullokkaran
> Fix For: 0.15.0
>
> Attachments: HIVE-9035.patch
>
>
> Not clear if this is a problem. Probably issue is in Optiq if it is, does it 
> know the UDF is non-deterministic? We could disable Optiq for such UDFs if 
> all else fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9035) CBO: Disable PPD when functions are non-deterministic (ppd_random.q - non-deterministic udf rand() pushed above join)

2014-12-10 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-9035:
-
Attachment: HIVE-9035.patch

> CBO: Disable PPD when functions are non-deterministic (ppd_random.q  - 
> non-deterministic udf rand() pushed above join)
> --
>
> Key: HIVE-9035
> URL: https://issues.apache.org/jira/browse/HIVE-9035
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Laljo John Pullokkaran
> Fix For: 0.15.0
>
> Attachments: HIVE-9035.patch
>
>
> Not clear if this is a problem. Probably issue is in Optiq if it is, does it 
> know the UDF is non-deterministic? We could disable Optiq for such UDFs if 
> all else fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9067) OrcFileMergeOperator may create merge file that does not match properties of input files

2014-12-10 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241966#comment-14241966
 ] 

Prasanth Jayachandran commented on HIVE-9067:
-

[~gopalv]/[~sershe] Can someone plz review this patch? Its a minor fix.

> OrcFileMergeOperator may create merge file that does not match properties of 
> input files
> 
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: Orc
> Attachments: HIVE-9067.1.patch
>
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9074) add ability to force direct sql usage for perf reasons

2014-12-10 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-9074:
--

 Summary: add ability to force direct sql usage for perf reasons
 Key: HIVE-9074
 URL: https://issues.apache.org/jira/browse/HIVE-9074
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Some people run direct SQL and hit failures (e.g. due to Oracle 
1000-in-expressions stupidity, illegal cast "optimization" in Derby and Oracle, 
or some other Hive and DB bugs). Currently, it falls back to ORM for such 
cases, however that can have huge impact on perf, and some people would rather 
have it fail so they can see the problem.
In addition to "off" and "on+fallback" modes, "on or fail" mode needs to be 
added. The default will remain the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9035) CBO: Disable PPD when functions are non-deterministic (ppd_random.q - non-deterministic udf rand() pushed above join)

2014-12-10 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-9035:
-
Summary: CBO: Disable PPD when functions are non-deterministic 
(ppd_random.q  - non-deterministic udf rand() pushed above join)  (was: CBO: 
ppd_random.q  - non-deterministic udf rand() pushed above join)

> CBO: Disable PPD when functions are non-deterministic (ppd_random.q  - 
> non-deterministic udf rand() pushed above join)
> --
>
> Key: HIVE-9035
> URL: https://issues.apache.org/jira/browse/HIVE-9035
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Laljo John Pullokkaran
> Fix For: 0.15.0
>
>
> Not clear if this is a problem. Probably issue is in Optiq if it is, does it 
> know the UDF is non-deterministic? We could disable Optiq for such UDFs if 
> all else fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9067) OrcFileMergeOperator may create merge file that does not match properties of input files

2014-12-10 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9067:

Priority: Minor  (was: Major)

> OrcFileMergeOperator may create merge file that does not match properties of 
> input files
> 
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: Orc
> Attachments: HIVE-9067.1.patch
>
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-860) Persistent distributed cache

2014-12-10 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-860:
--
Attachment: HIVE-860.4.patch

> Persistent distributed cache
> 
>
> Key: HIVE-860
> URL: https://issues.apache.org/jira/browse/HIVE-860
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.12.0
>Reporter: Zheng Shao
>Assignee: Dong Chen
> Fix For: 0.15.0
>
> Attachments: HIVE-860-debug.4.patch, HIVE-860.1.patch, 
> HIVE-860.2.patch, HIVE-860.2.patch, HIVE-860.3.patch, HIVE-860.4.patch, 
> HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.4.patch, HIVE-860.patch, 
> HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
> HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
> HIVE-860.patch, HIVE-860.patch
>
>
> DistributedCache is shared across multiple jobs, if the hdfs file name is the 
> same.
> We need to make sure Hive put the same file into the same location every time 
> and do not overwrite if the file content is the same.
> We can achieve 2 different results:
> A1. Files added with the same name, timestamp, and md5 in the same session 
> will have a single copy in distributed cache.
> A2. Filed added with the same name, timestamp, and md5 will have a single 
> copy in distributed cache.
> A2 has a bigger benefit in sharing but may raise a question on when Hive 
> should clean it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9067) OrcFileMergeOperator may create merge file that does not match properties of input files

2014-12-10 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9067:

Labels: Orc  (was: )

> OrcFileMergeOperator may create merge file that does not match properties of 
> input files
> 
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: Orc
> Attachments: HIVE-9067.1.patch
>
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9067) OrcFileMergeOperator may create merge file that does not match properties of input files

2014-12-10 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9067:

Status: Patch Available  (was: Open)

> OrcFileMergeOperator may create merge file that does not match properties of 
> input files
> 
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-9067.1.patch
>
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9067) OrcFileMergeOperator may create merge file that does not match properties of input files

2014-12-10 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9067:

Attachment: HIVE-9067.1.patch

> OrcFileMergeOperator may create merge file that does not match properties of 
> input files
> 
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-9067.1.patch
>
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9067) OrcFileMergeOperator may create merge file that does not match properties of input files

2014-12-10 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9067:

Summary: OrcFileMergeOperator may create merge file that does not match 
properties of input files  (was: ORC stripe merge does not work properly when 
orc index is disabled)

> OrcFileMergeOperator may create merge file that does not match properties of 
> input files
> 
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9067) ORC stripe merge does not work properly when orc index is disabled

2014-12-10 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9067:

Description: OrcFileMergeOperator creates a new ORC file and appends the 
stripes from smaller orc file. This new ORC file creation should retain the 
same configuration as the small ORC files. Currently it does not set the orc 
row index stride and file version.  (was: OrcFileMergeOperator creates a new 
ORC file and appends the stripes from smaller orc file. This new ORC file 
creation retains the same properties of the small orc files except the index 
stride size which can result in runtime exception while flushing the stripe.)

> ORC stripe merge does not work properly when orc index is disabled
> --
>
> Key: HIVE-9067
> URL: https://issues.apache.org/jira/browse/HIVE-9067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0, 0.14.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> OrcFileMergeOperator creates a new ORC file and appends the stripes from 
> smaller orc file. This new ORC file creation should retain the same 
> configuration as the small ORC files. Currently it does not set the orc row 
> index stride and file version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 >

1 - 100 of 225 matches

Mail list logo