[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9216:
-
Status: Patch Available  (was: Open)

> Avoid redundant clone of JobConf [Spark Branch]
> ---
>
> Key: HIVE-9216
> URL: https://issues.apache.org/jira/browse/HIVE-9216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-9216.1-spark.patch
>
>
> Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. 
> Should avoid this as cloning job conf involves writing to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9216:
-
Attachment: HIVE-9216.1-spark.patch

> Avoid redundant clone of JobConf [Spark Branch]
> ---
>
> Key: HIVE-9216
> URL: https://issues.apache.org/jira/browse/HIVE-9216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-9216.1-spark.patch
>
>
> Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. 
> Should avoid this as cloning job conf involves writing to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9216:
-
Priority: Minor  (was: Major)

> Avoid redundant clone of JobConf [Spark Branch]
> ---
>
> Key: HIVE-9216
> URL: https://issues.apache.org/jira/browse/HIVE-9216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
>
> Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. 
> Should avoid this as cloning job conf involves writing to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9216:
-
Summary: Avoid redundant clone of JobConf [Spark Branch]  (was: Avoid 
redundant clone of JobConf)

> Avoid redundant clone of JobConf [Spark Branch]
> ---
>
> Key: HIVE-9216
> URL: https://issues.apache.org/jira/browse/HIVE-9216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>
> Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. 
> Should avoid this as cloning job conf involves writing to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9216) Avoid redundant clone of JobConf

2014-12-25 Thread Rui Li (JIRA)
Rui Li created HIVE-9216:


 Summary: Avoid redundant clone of JobConf
 Key: HIVE-9216
 URL: https://issues.apache.org/jira/browse/HIVE-9216
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li


Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. 
Should avoid this as cloning job conf involves writing to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9167) Enhance encryption testing framework to allow create keys & zones inside .q files

2014-12-25 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258960#comment-14258960
 ] 

Dong Chen commented on HIVE-9167:
-

The approach looks good!

About not exposing the command to the end user, maybe we can leave the value of 
{{hive.security.command.whitelist}} in HiveConf as original, and adding the 
command into whitelist in conf when encryption test initialization. How does 
this sound? It is a simple way, although it does not really hide the cmd from 
user.

> Enhance encryption testing framework to allow create keys & zones inside .q 
> files
> -
>
> Key: HIVE-9167
> URL: https://issues.apache.org/jira/browse/HIVE-9167
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> The current implementation of the encryption testing framework on HIVE-8900 
> initializes a couple of encrypted databases to be used on .q test files. This 
> is useful in order to make tests small, but it does not test all details 
> found on the encryption implementation, such as: encrypted tables with 
> different encryption strength in the same database.
> We need to allow this kind of encryption as it is how it will be used in the 
> real world where a database will have a few encrypted tables (not all the DB).
> Also, we need to make this encryption framework flexible so that we can 
> create/delete keys & zones on demand when running the .q files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat

2014-12-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9153:
--
Affects Version/s: (was: spark-branch)

> Evaluate CombineHiveInputFormat versus HiveInputFormat
> --
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat

2014-12-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9153:
--
Summary: Evaluate CombineHiveInputFormat versus HiveInputFormat  (was: 
Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch])

> Evaluate CombineHiveInputFormat versus HiveInputFormat
> --
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9213) Improve the mask pattern in QTestUtil to save partial directory info in test result

2014-12-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258949#comment-14258949
 ] 

Hive QA commented on HIVE-9213:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12689150/HIVE-9213.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2199/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2199/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2199/

Messages:
{noformat}
 This message was trimmed, see log for full details 
 [copy] Copying 8 files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-it ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-it ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-it/0.15.0-SNAPSHOT/hive-it-0.15.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Integration - Custom Serde 0.15.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-it-custom-serde 
---
[INFO] Deleting 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde (includes 
= [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-it-custom-serde ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-it-custom-serde ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-it-custom-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-it-custom-serde ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-it-custom-serde ---
[INFO] Compiling 10 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/classes
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomSerDe2.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomSerDe2.java:
 Recompile with -Xlint:deprecation for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-it-custom-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-it-custom-serde 
---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf
 [copy] Copying 8 files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-it-custom-serde ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
hive-it-custom-serde ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-it-custom-serde ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/hive-it-custom-serde-0.15.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-it-custom-serde ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
hive-it-custom-serde ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/hive-it-custom-serde-0.15.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-serde/0.15.0-SNAPSHOT/hive-it-custom-serde-0.15.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-

[jira] [Updated] (HIVE-9213) Improve the mask pattern in QTestUtil to save partial directory info in test result

2014-12-25 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-9213:

Attachment: HIVE-9213.1.patch

Update patch V1 with a small change on the regex.

[~brocknoland], [~spena], [~Ferd], could you please help to review this patch 
when time is available? Thanks!

> Improve the mask pattern in QTestUtil to save partial directory info in test 
> result
> ---
>
> Key: HIVE-9213
> URL: https://issues.apache.org/jira/browse/HIVE-9213
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dong Chen
>Assignee: Dong Chen
> Fix For: encryption-branch
>
> Attachments: HIVE-9213.1.patch, HIVE-9213.patch
>
>
> The mask pattern in QTestUtil will mask directory in test result, since the 
> directory varies in different test env.
> However, in Encryption test, the directory info is needed to verify the 
> intermediate files are put in proper table. The whole directory is not 
> necessary, and part of it is enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8821) Create unit test where we insert into dynamically partitioned table

2014-12-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258937#comment-14258937
 ] 

Hive QA commented on HIVE-8821:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12689149/HIVE-8821.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2198/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2198/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2198/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2198/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveRecordReader.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target hcatalog/target hcatalog/core/target 
hcatalog/streaming/target hcatalog/server-extensions/target 
hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1647934.

At revision 1647934.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12689149 - PreCommit-HIVE-TRUNK-Build

> Create unit test where we insert into dynamically partitioned table
> ---
>
> Key: HIVE-8821
> URL: https://issues.apache.org/jira/browse/HIVE-8821
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Dong Chen
> Fix For: encryption-branch
>
> Attachments: HIVE-8821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8821) Create unit test where we insert into dynamically partitioned table

2014-12-25 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-8821:

Attachment: HIVE-8821.patch

Patch attached. It is similar with HIVE-8822 for static partitioned table, and 
verify 3 scenarios.

> Create unit test where we insert into dynamically partitioned table
> ---
>
> Key: HIVE-8821
> URL: https://issues.apache.org/jira/browse/HIVE-8821
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Dong Chen
> Fix For: encryption-branch
>
> Attachments: HIVE-8821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8821) Create unit test where we insert into dynamically partitioned table

2014-12-25 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-8821:

Fix Version/s: encryption-branch
 Assignee: Dong Chen
   Status: Patch Available  (was: Open)

> Create unit test where we insert into dynamically partitioned table
> ---
>
> Key: HIVE-8821
> URL: https://issues.apache.org/jira/browse/HIVE-8821
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Dong Chen
> Fix For: encryption-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258928#comment-14258928
 ] 

Hive QA commented on HIVE-9153:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12689146/HIVE-9153.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6722 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2197/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2197/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2197/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12689146 - PreCommit-HIVE-TRUNK-Build

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258913#comment-14258913
 ] 

Hive QA commented on HIVE-9039:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12689141/HIVE-9039.06.patch

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 6728 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_complex_alias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_join_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_top_level
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2196/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2196/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2196/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12689141 - PreCommit-HIVE-TRUNK-Build

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258909#comment-14258909
 ] 

Rui Li commented on HIVE-9153:
--

Strange thing is that {{Utilities}} is different in trunk and spark branch. But 
seems we have merged all the commits from trunk.

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9153:
-
Attachment: HIVE-9153.3.patch

Seems the redundant code in {{Utilities.getBasework}} has been taken care of in 
trunk. Revert that part for trunk patch.

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258897#comment-14258897
 ] 

Hive QA commented on HIVE-9153:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12689137/HIVE-9153.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2195/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2195/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2195/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2195/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_9.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_4.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_12.q.out'
Reverted 'ql/src/test/results/clientpositive/stats_list_bucket.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_8.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_11.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_5.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_13.q.out'
Reverted 'ql/src/test/results/clientpositive/partitions_json.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_2.q.out'
Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_10.q.out'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_5.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_11.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_13.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_9.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_2.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_4.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_10.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_12.q'
Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_8.q'
Reverted 'ql/src/test/queries/clientpositive/stats_list_bucket.q'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MapBuilder.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target hcatalog/target hcatalog/core/target 
hcatalog/streaming/target hcatalog/server-extensions/target 
hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target 
common/target common/src/gen service/target contrib/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/results/clientpositive/list_bucket_dml_9.q.java1.8.out 
ql/src/test/results/clientpositive/list_bucket_dml_13.q.java1.8.out 
ql/src/test/results/clientpositive/list_bucket_dml

[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Patch Available  (was: Open)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Attachment: HIVE-9039.06.patch

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Attachment: (was: HIVE-9039.06.patch)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Open  (was: Patch Available)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Patch Available  (was: Open)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Attachment: HIVE-9039.06.patch

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Open  (was: Patch Available)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Attachment: (was: HIVE-9039.06.patch)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Patch Available  (was: Open)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9153:
-
Attachment: HIVE-9153.2.patch

Upload trunk patch

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> HIVE-9153.2.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Attachment: HIVE-9039.06.patch

(1) support select distinct *
(2) use select distinct * to rewrite union distinct to union all with group by


> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9039) Support Union Distinct

2014-12-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9039:
--
Status: Open  (was: Patch Available)

> Support Union Distinct
> --
>
> Key: HIVE-9039
> URL: https://issues.apache.org/jira/browse/HIVE-9039
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, 
> HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch
>
>
> Current version (Hive 0.14) does not support union (or union distinct). It 
> only supports union all. In this patch, we try to add this new feature by 
> rewriting union distinct to union all followed by group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258885#comment-14258885
 ] 

Rui Li commented on HIVE-9153:
--

Hi [~brocknoland] and [~xuefuz],

Sorry maybe I was being confusing. The patch here is to reduce the call to 
{{Utilities.getBaseWork()}}, which is quite similar to HIVE-9127. Changes to 
{{Utilities.getBaseWork()}} is just to remove redundant code:
{code}
Path localPath;
if (conf.getBoolean("mapreduce.task.uberized", false) && 
name.equals(REDUCE_PLAN_NAME)) {
  localPath = new Path(name);
} else if (ShimLoader.getHadoopShims().isLocalMode(conf)) {
  localPath = path;
} else {
  LOG.info("***non-local mode***");
  localPath = new Path(name);
}
localPath = path;
LOG.info("local path = " + localPath);
{code}
Seems those if-else is unnecessary because localPath = path anyway, which makes 
localPath redundant too. But I can revert this change if you feel uncertain 
about it.
BTW, the path should be a trunk patch, I'll upload a trunk version to test 
again.

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9119) ZooKeeperHiveLockManager does not use zookeeper in the proper way

2014-12-25 Thread Na Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258873#comment-14258873
 ] 

Na Yang commented on HIVE-9119:
---

[~leftylev], thank you very much for reviewing the patch. I will take your 
suggestions. Regarding to your questions, please see my answer below:

2.What are the units for hive.zookeeper.connection.basesleeptime? A 
TimeValidator could be used here – see comment on HIVE-6679 for an example.

- [Na]: The unit is millisecond. I will follow the example when I upload a new 
patch.

3. Is the omission of an "E" for ZOOKEEPR deliberate in 
HIVE_ZOOKEEPR_CONNECTION_BASESLEEPTIME? It occurs once later in the code, also 
without the E.

-[Na]: It is a typo. I wil correct it in the new patch.

4. Just curious: What's initial about the basesleeptime?

-[Na]: CuratorFramework uses ExponentialBackoffRetryPolicy to reconnect to the 
ZooKeeper server. This retry policy retries a set number of times with 
increasing sleep time between retries. The basesleeptime is the sleep time for 
the first retry. I will explain it more clearly in the new patch.

Currently, the qtests do not run properly with the CuratorFramework change. So 
I need to work on that and upload a new patch with these doc changes later on.  

> ZooKeeperHiveLockManager does not use zookeeper in the proper way
> -
>
> Key: HIVE-9119
> URL: https://issues.apache.org/jira/browse/HIVE-9119
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 0.13.0, 0.14.0, 0.13.1
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-9119.1.patch
>
>
> ZooKeeperHiveLockManager does not use zookeeper in the proper way. 
> Currently a new zookeeper client instance is created for each 
> getlock/releaselock query which sometimes causes the number of open 
> connections between
> HiveServer2 and ZooKeeper exceed the max connection number that zookeeper 
> server allows. 
> To use zookeeper as a distributed lock, there is no need to create a new 
> zookeeper instance for every getlock try. A single zookeeper instance could 
> be reused and shared by ZooKeeperHiveLockManagers.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258841#comment-14258841
 ] 

Xuefu Zhang commented on HIVE-9153:
---

Re: Utilities.getBaseWork() changes, I suppose Rui is probably trying to clean 
up some redundant (useless) code. The changed code would be equivalent to the 
old one if "name" is the full path of the plan file on HDFS for non-local mode, 
which is very possible but needs to be confirmed.

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258840#comment-14258840
 ] 

Hive QA commented on HIVE-9153:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12689126/HIVE-9153.1-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7255 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/590/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/590/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-590/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12689126 - PreCommit-HIVE-SPARK-Build

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9176) Delegation token interval should be configurable in HadoopThriftAuthBridge

2014-12-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258830#comment-14258830
 ] 

Brock Noland commented on HIVE-9176:


You too :)

> Delegation token interval should be configurable in HadoopThriftAuthBridge
> --
>
> Key: HIVE-9176
> URL: https://issues.apache.org/jira/browse/HIVE-9176
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.15.0
>
> Attachments: HIVE-9176.1.patch, HIVE-9176.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9153:
---
Attachment: HIVE-9153.1-spark.patch

Uploading the patch again to test some change I made to ptest.

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258760#comment-14258760
 ] 

Hive QA commented on HIVE-9153:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12689107/HIVE-9153.1-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7255 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/589/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/589/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-589/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12689107 - PreCommit-HIVE-SPARK-Build

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258751#comment-14258751
 ] 

Brock Noland commented on HIVE-9153:


Nice, I see the perf improvement but I don't get the changes to 
{{Utilities.getBaseWork}}?

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258741#comment-14258741
 ] 

Rui Li commented on HIVE-9135:
--

I'm not sure if this is correct: we clone JobConf in 
{{SparkPalnGenerator.cloneJobConf}} and sets a different plan path for each 
BaseWork. These BaseWorks shouldn't be cached because each task needs to have 
its own BaseWork. Currently, when we sets different plan path, we just wipes 
out the original value and relies on Utilities to set a random one for us:
{code}
// Make sure we'll use a different plan path from the original one
HiveConf.setVar(cloned, HiveConf.ConfVars.PLAN, "");
{code}
Maybe we could set our own plan path with some special pre/postfix so Utilities 
can tell which BaseWork should be cached and which should not.

> Cache Map and Reduce works in RSC [Spark Branch]
> 
>
> Key: HIVE-9135
> URL: https://issues.apache.org/jira/browse/HIVE-9135
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Attachments: HIVE-9135.1-spark.patch, HIVE-9135.1-spark.patch
>
>
> HIVE-9127 works around the fact that we don't cache Map/Reduce works in 
> Spark. However, other input formats such as HiveInputFormat will not benefit 
> from that fix. We should investigate how to allow caching on the RSC while 
> not on tasks (see HIVE-7431).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9153:
-
Status: Patch Available  (was: Open)

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

2014-12-25 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9153:
-
Attachment: HIVE-9153.1-spark.patch

This patch should further improve spark performance by avoid retrieving MapWork 
from plan file.

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9153
> URL: https://issues.apache.org/jira/browse/HIVE-9153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9153.1-spark.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)