[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9216: - Status: Patch Available (was: Open) > Avoid redundant clone of JobConf [Spark Branch] > --- > > Key: HIVE-9216 > URL: https://issues.apache.org/jira/browse/HIVE-9216 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-9216.1-spark.patch > > > Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. > Should avoid this as cloning job conf involves writing to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9216: - Attachment: HIVE-9216.1-spark.patch > Avoid redundant clone of JobConf [Spark Branch] > --- > > Key: HIVE-9216 > URL: https://issues.apache.org/jira/browse/HIVE-9216 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-9216.1-spark.patch > > > Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. > Should avoid this as cloning job conf involves writing to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9216: - Priority: Minor (was: Major) > Avoid redundant clone of JobConf [Spark Branch] > --- > > Key: HIVE-9216 > URL: https://issues.apache.org/jira/browse/HIVE-9216 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > > Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. > Should avoid this as cloning job conf involves writing to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9216: - Summary: Avoid redundant clone of JobConf [Spark Branch] (was: Avoid redundant clone of JobConf) > Avoid redundant clone of JobConf [Spark Branch] > --- > > Key: HIVE-9216 > URL: https://issues.apache.org/jira/browse/HIVE-9216 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > > Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. > Should avoid this as cloning job conf involves writing to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9216) Avoid redundant clone of JobConf
Rui Li created HIVE-9216: Summary: Avoid redundant clone of JobConf Key: HIVE-9216 URL: https://issues.apache.org/jira/browse/HIVE-9216 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. Should avoid this as cloning job conf involves writing to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9167) Enhance encryption testing framework to allow create keys & zones inside .q files
[ https://issues.apache.org/jira/browse/HIVE-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258960#comment-14258960 ] Dong Chen commented on HIVE-9167: - The approach looks good! About not exposing the command to the end user, maybe we can leave the value of {{hive.security.command.whitelist}} in HiveConf as original, and adding the command into whitelist in conf when encryption test initialization. How does this sound? It is a simple way, although it does not really hide the cmd from user. > Enhance encryption testing framework to allow create keys & zones inside .q > files > - > > Key: HIVE-9167 > URL: https://issues.apache.org/jira/browse/HIVE-9167 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Sergio Peña > > The current implementation of the encryption testing framework on HIVE-8900 > initializes a couple of encrypted databases to be used on .q test files. This > is useful in order to make tests small, but it does not test all details > found on the encryption implementation, such as: encrypted tables with > different encryption strength in the same database. > We need to allow this kind of encryption as it is how it will be used in the > real world where a database will have a few encrypted tables (not all the DB). > Also, we need to make this encryption framework flexible so that we can > create/delete keys & zones on demand when running the .q files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9153: -- Affects Version/s: (was: spark-branch) > Evaluate CombineHiveInputFormat versus HiveInputFormat > -- > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9153: -- Summary: Evaluate CombineHiveInputFormat versus HiveInputFormat (was: Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]) > Evaluate CombineHiveInputFormat versus HiveInputFormat > -- > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9213) Improve the mask pattern in QTestUtil to save partial directory info in test result
[ https://issues.apache.org/jira/browse/HIVE-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258949#comment-14258949 ] Hive QA commented on HIVE-9213: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12689150/HIVE-9213.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2199/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2199/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2199/ Messages: {noformat} This message was trimmed, see log for full details [copy] Copying 8 files to /data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-it --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-it --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/itests/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-it/0.15.0-SNAPSHOT/hive-it-0.15.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Integration - Custom Serde 0.15.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-it-custom-serde --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-it-custom-serde --- [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-it-custom-serde --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-it-custom-serde --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-it-custom-serde --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-it-custom-serde --- [INFO] Compiling 10 source files to /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/classes [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomSerDe2.java: Some input files use or override a deprecated API. [WARNING] /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomSerDe2.java: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-it-custom-serde --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-it-custom-serde --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf [copy] Copying 8 files to /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-it-custom-serde --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-it-custom-serde --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-it-custom-serde --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/hive-it-custom-serde-0.15.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-it-custom-serde --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-it-custom-serde --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/hive-it-custom-serde-0.15.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-serde/0.15.0-SNAPSHOT/hive-it-custom-serde-0.15.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-
[jira] [Updated] (HIVE-9213) Improve the mask pattern in QTestUtil to save partial directory info in test result
[ https://issues.apache.org/jira/browse/HIVE-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-9213: Attachment: HIVE-9213.1.patch Update patch V1 with a small change on the regex. [~brocknoland], [~spena], [~Ferd], could you please help to review this patch when time is available? Thanks! > Improve the mask pattern in QTestUtil to save partial directory info in test > result > --- > > Key: HIVE-9213 > URL: https://issues.apache.org/jira/browse/HIVE-9213 > Project: Hive > Issue Type: Sub-task >Reporter: Dong Chen >Assignee: Dong Chen > Fix For: encryption-branch > > Attachments: HIVE-9213.1.patch, HIVE-9213.patch > > > The mask pattern in QTestUtil will mask directory in test result, since the > directory varies in different test env. > However, in Encryption test, the directory info is needed to verify the > intermediate files are put in proper table. The whole directory is not > necessary, and part of it is enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8821) Create unit test where we insert into dynamically partitioned table
[ https://issues.apache.org/jira/browse/HIVE-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258937#comment-14258937 ] Hive QA commented on HIVE-8821: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12689149/HIVE-8821.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2198/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2198/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2198/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2198/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveRecordReader.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1647934. At revision 1647934. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12689149 - PreCommit-HIVE-TRUNK-Build > Create unit test where we insert into dynamically partitioned table > --- > > Key: HIVE-8821 > URL: https://issues.apache.org/jira/browse/HIVE-8821 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Dong Chen > Fix For: encryption-branch > > Attachments: HIVE-8821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8821) Create unit test where we insert into dynamically partitioned table
[ https://issues.apache.org/jira/browse/HIVE-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-8821: Attachment: HIVE-8821.patch Patch attached. It is similar with HIVE-8822 for static partitioned table, and verify 3 scenarios. > Create unit test where we insert into dynamically partitioned table > --- > > Key: HIVE-8821 > URL: https://issues.apache.org/jira/browse/HIVE-8821 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Dong Chen > Fix For: encryption-branch > > Attachments: HIVE-8821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8821) Create unit test where we insert into dynamically partitioned table
[ https://issues.apache.org/jira/browse/HIVE-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-8821: Fix Version/s: encryption-branch Assignee: Dong Chen Status: Patch Available (was: Open) > Create unit test where we insert into dynamically partitioned table > --- > > Key: HIVE-8821 > URL: https://issues.apache.org/jira/browse/HIVE-8821 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Dong Chen > Fix For: encryption-branch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258928#comment-14258928 ] Hive QA commented on HIVE-9153: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12689146/HIVE-9153.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6722 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2197/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2197/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2197/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12689146 - PreCommit-HIVE-TRUNK-Build > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258913#comment-14258913 ] Hive QA commented on HIVE-9039: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12689141/HIVE-9039.06.patch {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 6728 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_complex_alias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_join_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union34 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_top_level org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2196/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2196/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2196/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12689141 - PreCommit-HIVE-TRUNK-Build > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258909#comment-14258909 ] Rui Li commented on HIVE-9153: -- Strange thing is that {{Utilities}} is different in trunk and spark branch. But seems we have merged all the commits from trunk. > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9153: - Attachment: HIVE-9153.3.patch Seems the redundant code in {{Utilities.getBasework}} has been taken care of in trunk. Revert that part for trunk patch. > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258897#comment-14258897 ] Hive QA commented on HIVE-9153: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12689137/HIVE-9153.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2195/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2195/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2195/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2195/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_9.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_4.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_12.q.out' Reverted 'ql/src/test/results/clientpositive/stats_list_bucket.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_8.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_11.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_5.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_13.q.out' Reverted 'ql/src/test/results/clientpositive/partitions_json.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_2.q.out' Reverted 'ql/src/test/results/clientpositive/list_bucket_dml_10.q.out' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_5.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_11.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_13.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_9.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_2.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_4.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_10.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_12.q' Reverted 'ql/src/test/queries/clientpositive/list_bucket_dml_8.q' Reverted 'ql/src/test/queries/clientpositive/stats_list_bucket.q' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MapBuilder.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/results/clientpositive/list_bucket_dml_9.q.java1.8.out ql/src/test/results/clientpositive/list_bucket_dml_13.q.java1.8.out ql/src/test/results/clientpositive/list_bucket_dml
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Status: Patch Available (was: Open) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Attachment: HIVE-9039.06.patch > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Attachment: (was: HIVE-9039.06.patch) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Status: Open (was: Patch Available) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Status: Patch Available (was: Open) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Attachment: HIVE-9039.06.patch > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Status: Open (was: Patch Available) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Attachment: (was: HIVE-9039.06.patch) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Status: Patch Available (was: Open) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9153: - Attachment: HIVE-9153.2.patch Upload trunk patch > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > HIVE-9153.2.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Attachment: HIVE-9039.06.patch (1) support select distinct * (2) use select distinct * to rewrite union distinct to union all with group by > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch, HIVE-9039.06.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9039) Support Union Distinct
[ https://issues.apache.org/jira/browse/HIVE-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9039: -- Status: Open (was: Patch Available) > Support Union Distinct > -- > > Key: HIVE-9039 > URL: https://issues.apache.org/jira/browse/HIVE-9039 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-9039.01.patch, HIVE-9039.02.patch, > HIVE-9039.03.patch, HIVE-9039.04.patch, HIVE-9039.05.patch > > > Current version (Hive 0.14) does not support union (or union distinct). It > only supports union all. In this patch, we try to add this new feature by > rewriting union distinct to union all followed by group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258885#comment-14258885 ] Rui Li commented on HIVE-9153: -- Hi [~brocknoland] and [~xuefuz], Sorry maybe I was being confusing. The patch here is to reduce the call to {{Utilities.getBaseWork()}}, which is quite similar to HIVE-9127. Changes to {{Utilities.getBaseWork()}} is just to remove redundant code: {code} Path localPath; if (conf.getBoolean("mapreduce.task.uberized", false) && name.equals(REDUCE_PLAN_NAME)) { localPath = new Path(name); } else if (ShimLoader.getHadoopShims().isLocalMode(conf)) { localPath = path; } else { LOG.info("***non-local mode***"); localPath = new Path(name); } localPath = path; LOG.info("local path = " + localPath); {code} Seems those if-else is unnecessary because localPath = path anyway, which makes localPath redundant too. But I can revert this change if you feel uncertain about it. BTW, the path should be a trunk patch, I'll upload a trunk version to test again. > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9119) ZooKeeperHiveLockManager does not use zookeeper in the proper way
[ https://issues.apache.org/jira/browse/HIVE-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258873#comment-14258873 ] Na Yang commented on HIVE-9119: --- [~leftylev], thank you very much for reviewing the patch. I will take your suggestions. Regarding to your questions, please see my answer below: 2.What are the units for hive.zookeeper.connection.basesleeptime? A TimeValidator could be used here – see comment on HIVE-6679 for an example. - [Na]: The unit is millisecond. I will follow the example when I upload a new patch. 3. Is the omission of an "E" for ZOOKEEPR deliberate in HIVE_ZOOKEEPR_CONNECTION_BASESLEEPTIME? It occurs once later in the code, also without the E. -[Na]: It is a typo. I wil correct it in the new patch. 4. Just curious: What's initial about the basesleeptime? -[Na]: CuratorFramework uses ExponentialBackoffRetryPolicy to reconnect to the ZooKeeper server. This retry policy retries a set number of times with increasing sleep time between retries. The basesleeptime is the sleep time for the first retry. I will explain it more clearly in the new patch. Currently, the qtests do not run properly with the CuratorFramework change. So I need to work on that and upload a new patch with these doc changes later on. > ZooKeeperHiveLockManager does not use zookeeper in the proper way > - > > Key: HIVE-9119 > URL: https://issues.apache.org/jira/browse/HIVE-9119 > Project: Hive > Issue Type: Improvement > Components: Locking >Affects Versions: 0.13.0, 0.14.0, 0.13.1 >Reporter: Na Yang >Assignee: Na Yang > Attachments: HIVE-9119.1.patch > > > ZooKeeperHiveLockManager does not use zookeeper in the proper way. > Currently a new zookeeper client instance is created for each > getlock/releaselock query which sometimes causes the number of open > connections between > HiveServer2 and ZooKeeper exceed the max connection number that zookeeper > server allows. > To use zookeeper as a distributed lock, there is no need to create a new > zookeeper instance for every getlock try. A single zookeeper instance could > be reused and shared by ZooKeeperHiveLockManagers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258841#comment-14258841 ] Xuefu Zhang commented on HIVE-9153: --- Re: Utilities.getBaseWork() changes, I suppose Rui is probably trying to clean up some redundant (useless) code. The changed code would be equivalent to the old one if "name" is the full path of the plan file on HDFS for non-local mode, which is very possible but needs to be confirmed. > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258840#comment-14258840 ] Hive QA commented on HIVE-9153: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12689126/HIVE-9153.1-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7255 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/590/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/590/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-590/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12689126 - PreCommit-HIVE-SPARK-Build > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9176) Delegation token interval should be configurable in HadoopThriftAuthBridge
[ https://issues.apache.org/jira/browse/HIVE-9176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258830#comment-14258830 ] Brock Noland commented on HIVE-9176: You too :) > Delegation token interval should be configurable in HadoopThriftAuthBridge > -- > > Key: HIVE-9176 > URL: https://issues.apache.org/jira/browse/HIVE-9176 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.14.0 >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 0.15.0 > > Attachments: HIVE-9176.1.patch, HIVE-9176.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9153: --- Attachment: HIVE-9153.1-spark.patch Uploading the patch again to test some change I made to ptest. > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, > screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258760#comment-14258760 ] Hive QA commented on HIVE-9153: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12689107/HIVE-9153.1-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7255 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/589/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/589/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-589/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12689107 - PreCommit-HIVE-SPARK-Build > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258751#comment-14258751 ] Brock Noland commented on HIVE-9153: Nice, I see the perf improvement but I don't get the changes to {{Utilities.getBaseWork}}? > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258741#comment-14258741 ] Rui Li commented on HIVE-9135: -- I'm not sure if this is correct: we clone JobConf in {{SparkPalnGenerator.cloneJobConf}} and sets a different plan path for each BaseWork. These BaseWorks shouldn't be cached because each task needs to have its own BaseWork. Currently, when we sets different plan path, we just wipes out the original value and relies on Utilities to set a random one for us: {code} // Make sure we'll use a different plan path from the original one HiveConf.setVar(cloned, HiveConf.ConfVars.PLAN, ""); {code} Maybe we could set our own plan path with some special pre/postfix so Utilities can tell which BaseWork should be cached and which should not. > Cache Map and Reduce works in RSC [Spark Branch] > > > Key: HIVE-9135 > URL: https://issues.apache.org/jira/browse/HIVE-9135 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Jimmy Xiang > Attachments: HIVE-9135.1-spark.patch, HIVE-9135.1-spark.patch > > > HIVE-9127 works around the fact that we don't cache Map/Reduce works in > Spark. However, other input formats such as HiveInputFormat will not benefit > from that fix. We should investigate how to allow caching on the RSC while > not on tasks (see HIVE-7431). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9153: - Status: Patch Available (was: Open) > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9153: - Attachment: HIVE-9153.1-spark.patch This patch should further improve spark performance by avoid retrieving MapWork from plan file. > Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch] > - > > Key: HIVE-9153 > URL: https://issues.apache.org/jira/browse/HIVE-9153 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Attachments: HIVE-9153.1-spark.patch, screenshot.PNG > > > The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. > However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in > Spark, it might make sense for us to use {{HiveInputFormat}} as well. We > should evaluate this on a query which has many input splits such as {{select > count(\*) from store_sales where something is not null}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)