[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr
[ https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085276#comment-16085276 ] Rui Li commented on HIVE-15104: --- I also run another round of TPC-DS. The overall shuffle data is reduced by 12%. The query time improvement is however negligible - about 1.5%. [~xuefuz] do you think it's worth the effort? > Hive on Spark generate more shuffle data than hive on mr > > > Key: HIVE-15104 > URL: https://issues.apache.org/jira/browse/HIVE-15104 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, > HIVE-15104.3.patch, HIVE-15104.4.patch, TPC-H 100G.xlsx > > > the same sql, running on spark and mr engine, will generate different size > of shuffle data. > i think it is because of hive on mr just serialize part of HiveKey, but hive > on spark which using kryo will serialize full of Hivekey object. > what is your opionion? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests
[ https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085275#comment-16085275 ] Hive QA commented on HIVE-15898: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12877014/HIVE-15898.08.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10890 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd] (batchId=144) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6001/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6001/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6001/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12877014 - PreCommit-HIVE-Build > add Type2 SCD merge tests > - > > Key: HIVE-15898 > URL: https://issues.apache.org/jira/browse/HIVE-15898 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, > HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, > HIVE-15898.06.patch, HIVE-15898.07.patch, HIVE-15898.08.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr
[ https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15104: -- Attachment: HIVE-15104.4.patch Update patch v4: 1. Moved the registrator code to a resource file. Hopefully the patch is more readable. 2. To be safe, we still have to store the hash code. But that's still better than the generic serializer. > Hive on Spark generate more shuffle data than hive on mr > > > Key: HIVE-15104 > URL: https://issues.apache.org/jira/browse/HIVE-15104 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, > HIVE-15104.3.patch, HIVE-15104.4.patch, TPC-H 100G.xlsx > > > the same sql, running on spark and mr engine, will generate different size > of shuffle data. > i think it is because of hive on mr just serialize part of HiveKey, but hive > on spark which using kryo will serialize full of Hivekey object. > what is your opionion? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.4.PATCH > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch, HIVE-17078.4.PATCH > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085252#comment-16085252 ] Yibing Shi commented on HIVE-17078: --- Checked the failed tests. # org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] fails with something irrelevant to this patch # org.apache.hive.hcatalog.api.TestHCatClient. The failure also has nothing to do our patch. # org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14/23] fails because the output has changed in order. Nothing serious. We need to somehow update the .out files, but maybe in a separate JIRA # The other tests fails because now we have more logs in local tasks. Will update the .out files. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set
[ https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085218#comment-16085218 ] Hive QA commented on HIVE-16960: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12877008/HIVE16960.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10890 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=233) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6000/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6000/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6000/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12877008 - PreCommit-HIVE-Build > Hive throws an ugly error exception when HDFS sticky bit is set > --- > > Key: HIVE-16960 > URL: https://issues.apache.org/jira/browse/HIVE-16960 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Janaki Lahorani >Assignee: Janaki Lahorani >Priority: Critical > Labels: newbie > Fix For: 3.0.0 > > Attachments: HIVE16960.1.patch, HIVE16960.2.patch, HIVE16960.2.patch, > HIVE16960.3.patch > > > When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user > other than the HDFS file owner, and the HDFS sticky bit is set, then Hive > will throw an error exception message that the file cannot be moved due to > permission issues. > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied by sticky bit setting: user=hive, > inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk > The permission denied is expected, but the error message does not make sense > to users + the stack trace displayed is huge. We should display a better > error message to users, and maybe provide with help information about how to > fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Open (was: Patch Available) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085203#comment-16085203 ] Vaibhav Gumashta commented on HIVE-4577: [~libing] Looks like patch v7 didn't apply on master > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, > HIVE-4577.6.patch, HIVE-4577.7.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Patch Available (was: Open) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Attachment: HIVE-16966.05.patch rebase to master > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair
[ https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085178#comment-16085178 ] Hive QA commented on HIVE-17069: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12877003/HIVE-17069.02.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5999/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5999/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5999/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-13 04:50:59.904 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5999/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-13 04:50:59.906 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-13 04:51:00.740 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java:290 error: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java: patch does not apply error: patch failed: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java:35 error: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12877003 - PreCommit-HIVE-Build > Refactor OrcRawRecrodMerger.ReaderPair > -- > > Key: HIVE-17069 > URL: https://issues.apache.org/jira/browse/HIVE-17069 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17069.01.patch, HIVE-17069.02.patch > > > this should be done post HIVE-16177 so as not to obscure the functional > changes completely > Make ReaderPair an interface > ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" > code path > OriginalReaderPair - same as now but w/o incomprehensible override/variable > shadowing logic. > Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common > base class. > Push discoverKeyBounds() into appropriate implementation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085176#comment-16085176 ] Hive QA commented on HIVE-4577: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12877002/HIVE-4577.7.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5998/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5998/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5998/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-13 04:50:25.104 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5998/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-13 04:50:25.107 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline) + git clean -f -d Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java.orig Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderAdaptor.java Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java.orig Removing ql/src/test/queries/clientpositive/llap_acid_fast.q Removing ql/src/test/results/clientpositive/llap/llap_acid.q.out Removing ql/src/test/results/clientpositive/llap/llap_acid_fast.q.out Removing ql/src/test/results/clientpositive/llap_acid_fast.q.out + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-13 04:50:30.983 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query14.q.out: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12877002 - PreCommit-HIVE-Build > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, > HIVE-4577.6.patch, HIVE-4577.7.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing
[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085173#comment-16085173 ] Hive QA commented on HIVE-12631: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876993/HIVE-12631.18.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10892 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_windowing2] (batchId=10) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=151) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5997/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5997/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5997/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876993 - PreCommit-HIVE-Build > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.1.patch, > HIVE-12631.2.patch, HIVE-12631.3.patch, HIVE-12631.4.patch, > HIVE-12631.5.patch, HIVE-12631.6.patch, HIVE-12631.7.patch, > HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests
[ https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-15898: -- Attachment: HIVE-15898.08.patch > add Type2 SCD merge tests > - > > Key: HIVE-15898 > URL: https://issues.apache.org/jira/browse/HIVE-15898 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, > HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, > HIVE-15898.06.patch, HIVE-15898.07.patch, HIVE-15898.08.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085146#comment-16085146 ] Hive QA commented on HIVE-16996: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876982/HIVE-16966.04.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5996/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5996/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5996/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-13 03:48:51.176 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5996/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-13 03:48:51.178 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-13 03:48:57.351 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch fatal: git apply: bad git-diff - inconsistent old filename on line 33 The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12876982 - PreCommit-HIVE-Build > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085144#comment-16085144 ] Hive QA commented on HIVE-16926: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876981/HIVE-16926.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10889 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5995/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5995/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5995/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876981 - PreCommit-HIVE-Build > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch, HIVE-16926.5.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085090#comment-16085090 ] Rui Li commented on HIVE-16922: --- Hi [~libing], I guess we also need metastore upgrade scripts to update the values that are already stored in DB, so that users can continue use the data when they upgrade to the new hive version. > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085078#comment-16085078 ] Hive QA commented on HIVE-17078: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876980/HIVE-17078.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10888 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=69) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8] (batchId=4) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_convert_join] (batchId=51) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=12) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5994/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5994/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5994/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876980 - PreCommit-HIVE-Build > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set
[ https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16960: --- Attachment: HIVE16960.3.patch > Hive throws an ugly error exception when HDFS sticky bit is set > --- > > Key: HIVE-16960 > URL: https://issues.apache.org/jira/browse/HIVE-16960 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Janaki Lahorani >Assignee: Janaki Lahorani >Priority: Critical > Labels: newbie > Fix For: 3.0.0 > > Attachments: HIVE16960.1.patch, HIVE16960.2.patch, HIVE16960.2.patch, > HIVE16960.3.patch > > > When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user > other than the HDFS file owner, and the HDFS sticky bit is set, then Hive > will throw an error exception message that the file cannot be moved due to > permission issues. > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied by sticky bit setting: user=hive, > inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk > The permission denied is expected, but the error message does not make sense > to users + the stack trace displayed is huge. We should display a better > error message to users, and maybe provide with help information about how to > fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085069#comment-16085069 ] Bing Li commented on HIVE-16922: [~lirui], I can't reproduce TestMiniLlapLocalCliDriver[vector_if_expr] in my env, I don't think it caused by this patch. > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair
[ https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17069: -- Status: Patch Available (was: Open) > Refactor OrcRawRecrodMerger.ReaderPair > -- > > Key: HIVE-17069 > URL: https://issues.apache.org/jira/browse/HIVE-17069 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17069.01.patch, HIVE-17069.02.patch > > > this should be done post HIVE-16177 so as not to obscure the functional > changes completely > Make ReaderPair an interface > ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" > code path > OriginalReaderPair - same as now but w/o incomprehensible override/variable > shadowing logic. > Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common > base class. > Push discoverKeyBounds() into appropriate implementation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-4577: -- Attachment: HIVE-4577.7.patch Add the golden file for dfscmd.q > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, > HIVE-4577.6.patch, HIVE-4577.7.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair
[ https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17069: -- Attachment: HIVE-17069.02.patch > Refactor OrcRawRecrodMerger.ReaderPair > -- > > Key: HIVE-17069 > URL: https://issues.apache.org/jira/browse/HIVE-17069 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17069.01.patch, HIVE-17069.02.patch > > > this should be done post HIVE-16177 so as not to obscure the functional > changes completely > Make ReaderPair an interface > ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" > code path > OriginalReaderPair - same as now but w/o incomprehensible override/variable > shadowing logic. > Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common > base class. > Push discoverKeyBounds() into appropriate implementation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used
[ https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085040#comment-16085040 ] Matt McCline commented on HIVE-16975: - I don't see test failures related to this change. > Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is > now used > - > > Key: HIVE-16975 > URL: https://issues.apache.org/jira/browse/HIVE-16975 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16975.1.patch > > > Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085036#comment-16085036 ] Hive QA commented on HIVE-16793: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876977/HIVE-16793.6.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10876 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.minikdc.TestJdbcNonKrbSASLWithMiniKdc.org.apache.hive.minikdc.TestJdbcNonKrbSASLWithMiniKdc (batchId=238) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5993/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5993/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5993/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876977 - PreCommit-HIVE-Build > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch, HIVE-16793.6.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60]
[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-12631: -- Attachment: HIVE-12631.18.patch > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.1.patch, > HIVE-12631.2.patch, HIVE-12631.3.patch, HIVE-12631.4.patch, > HIVE-12631.5.patch, HIVE-12631.6.patch, HIVE-12631.7.patch, > HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16541) PTF: Avoid shuffling constant keys for empty OVER()
[ https://issues.apache.org/jira/browse/HIVE-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085004#comment-16085004 ] Gopal V commented on HIVE-16541: Thanks [~ashutoshc], the window non-streaming codepath seems to be the problem, I'll debug a bit more. > PTF: Avoid shuffling constant keys for empty OVER() > --- > > Key: HIVE-16541 > URL: https://issues.apache.org/jira/browse/HIVE-16541 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16541.1.patch, HIVE-16541.2.patch > > > Generating surrogate keys with > {code} > select row_number() over() as p_key, * from table; > {code} > uses a sorted edge with "0 ASC NULLS FIRST" as the sort order. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators
[ https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084986#comment-16084986 ] Hive QA commented on HIVE-16100: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876961/HIVE-16100.5.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10888 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby_empty] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_move_tasks_share_dependencies] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_windowing2] (batchId=10) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reducesink_dedup] (batchId=22) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_windowing_2] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5992/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5992/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5992/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876961 - PreCommit-HIVE-Build > Dynamic Sorted Partition optimizer loses sibling operators > -- > > Key: HIVE-16100 > URL: https://issues.apache.org/jira/browse/HIVE-16100 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.1, 2.1.1, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, > HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch, HIVE-16100.5.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173 > {code} > // unlink connection between FS and its parent > fsParent = fsOp.getParentOperators().get(0); > fsParent.getChildOperators().clear(); > {code} > The optimizer discards any cases where the fsParent has another SEL child -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-16926: -- Attachment: HIVE-16926.5.patch Patch v5, with changes per feedback. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch, HIVE-16926.5.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Patch Available (was: Open) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17021) Support replication of concatenate operation.
[ https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084950#comment-16084950 ] Daniel Dai commented on HIVE-17021: --- +1. Will commit shortly. > Support replication of concatenate operation. > - > > Key: HIVE-17021 > URL: https://issues.apache.org/jira/browse/HIVE-17021 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17021.01.patch > > > We need to handle cases like ALTER TABLE ... CONCATENATE that also change the > files on disk, and potentially treat them similar to INSERT OVERWRITE, as it > does something equivalent to a compaction. > Note that a ConditionalTask might also be fired at the end of inserts at the > end of a tez task (or other exec engine) if appropriate HiveConf settings are > set, to automatically do this operation - these also need to be taken care of > for replication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Attachment: (was: HIVE-16966.04.patch) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Open (was: Patch Available) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Attachment: HIVE-16966.04.patch > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084918#comment-16084918 ] Hive QA commented on HIVE-16793: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876952/HIVE-16793.5.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5991/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5991/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5991/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-12 23:46:50.471 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5991/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-12 23:46:50.474 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 353781c..6af30bf master -> origin/master + git reset --hard HEAD HEAD is now at 353781c HIVE-17079: LLAP: Use FQDN by default for work submission (Prasanth Jayachandran reviewed by Gopal V) + git clean -f -d Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderAdaptor.java Removing ql/src/test/queries/clientpositive/llap_acid_fast.q Removing ql/src/test/results/clientpositive/llap/llap_acid.q.out Removing ql/src/test/results/clientpositive/llap/llap_acid_fast.q.out Removing ql/src/test/results/clientpositive/llap_acid_fast.q.out + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 6af30bf HIVE-16832 duplicate ROW__ID possible in multi insert into transactional table (Eugene Koifman, reviewed by Gopal V) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-12 23:46:56.393 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: No such file or directory error: a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java: No such file or directory error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: No such file or directory error: a/ql/src/test/queries/clientpositive/subquery_scalar.q: No such file or directory error: a/ql/src/test/results/clientpositive/llap/subquery_scalar.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query14.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query23.q.out: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12876952 - PreCommit-HIVE-Build > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task
[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084914#comment-16084914 ] Hive QA commented on HIVE-12631: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876915/HIVE-12631.17.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10871 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_reader] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=239) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5990/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5990/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5990/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876915 - PreCommit-HIVE-Build > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.17.patch, HIVE-12631.1.patch, HIVE-12631.2.patch, > HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, > HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, > HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16541) PTF: Avoid shuffling constant keys for empty OVER()
[ https://issues.apache.org/jira/browse/HIVE-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084941#comment-16084941 ] Ashutosh Chauhan commented on HIVE-16541: - In some of golden files result set has changed, which looks incorrect. > PTF: Avoid shuffling constant keys for empty OVER() > --- > > Key: HIVE-16541 > URL: https://issues.apache.org/jira/browse/HIVE-16541 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16541.1.patch, HIVE-16541.2.patch > > > Generating surrogate keys with > {code} > select row_number() over() as p_key, * from table; > {code} > uses a sorted edge with "0 ASC NULLS FIRST" as the sort order. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.3.patch Add a bit more logs > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, > HIVE-17078.3.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Attachment: HIVE-16793.6.patch > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch, HIVE-16793.6.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Status: Open (was: Patch Available) > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] >
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Status: Patch Available (was: Open) > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch, HIVE-16793.6.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_
[jira] [Updated] (HIVE-17013) Delete request with a subquery based on select over a view
[ https://issues.apache.org/jira/browse/HIVE-17013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17013: -- Component/s: Transactions > Delete request with a subquery based on select over a view > -- > > Key: HIVE-17013 > URL: https://issues.apache.org/jira/browse/HIVE-17013 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Frédéric ESCANDELL >Priority: Blocker > > Hi, > I based my DDL on this exemple > https://fr.hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/. > In a delete request, the use of a view in a subquery throw an exception : > FAILED: IllegalStateException Expected 'insert into table default.mydim > select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set > operation. > {code} > {code:sql} > drop table if exists mydim; > create table mydim (key int, name string, zip string, is_current boolean) > clustered by(key) into 3 buckets > stored as orc tblproperties ('transactional'='true'); > insert into mydim values > (1, 'bob', '95136', true), > (2, 'joe', '70068', true), > (3, 'steve', '22150', true); > drop table if exists updates_staging_table; > create table updates_staging_table (key int, newzip string); > insert into updates_staging_table values (1, 87102), (3, 45220); > drop view if exists updates_staging_view; > create view updates_staging_view (key, newzip) as select key, newzip from > updates_staging_table; > delete from mydim > where mydim.key in (select key from updates_staging_view); > FAILED: IllegalStateException Expected 'insert into table default.mydim > select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set > operation. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084891#comment-16084891 ] Yibing Shi commented on HIVE-17078: --- I am trying to keep the current behaviour. With Hive CLI, by default Hive logs are not printed. Some users may rely on the stdout/stderr information. I don't want to surprise them. If you still think it is unnecessary to print child stdout/stderr to Hive stdout/stderr, I can remove the corresponding code. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084878#comment-16084878 ] Eugene Koifman commented on HIVE-16832: --- no related failures (see builds 5985,5984 for same failures) HIVE-16832.22.patch committed to master (3.0) thanks Gopal for the review > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16953) OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and end are in the same stripe
[ https://issues.apache.org/jira/browse/HIVE-16953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16953: -- Summary: OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and end are in the same stripe (was: OrcRawRecordMerger.discoverOriginalKeyBounds) > OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and > end are in the same stripe > - > > Key: HIVE-16953 > URL: https://issues.apache.org/jira/browse/HIVE-16953 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman > > if getOffset() and getMaxOffset() are inside > * the sames tripe - in this case we have minKey & isTail=false but > rowLength is never set. > don't know if we can ever have a split like that -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-14947) Add support for Acid 2 in Merge
[ https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman resolved HIVE-14947. --- Resolution: Fixed Fix Version/s: 3.0.0 > Add support for Acid 2 in Merge > --- > > Key: HIVE-14947 > URL: https://issues.apache.org/jira/browse/HIVE-14947 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 3.0.0 > > > HIVE-14035 etc introduced a more efficient data layout for acid tables > Additional work is needed to support Merge for these tables > Need to make sure we generate unique ROW__IDs in each branch of the > multi-insert statement. StatementId was introduced in HIVE-11030 but it's > not surfaced from storage layer. It needs to be made part of ROW__ID to > ensure unique ROW__ID from concurrent writes from the same query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14947) Add support for Acid 2 in Merge
[ https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084884#comment-16084884 ] Eugene Koifman commented on HIVE-14947: --- fixed via HIVE-16832 > Add support for Acid 2 in Merge > --- > > Key: HIVE-14947 > URL: https://issues.apache.org/jira/browse/HIVE-14947 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 3.0.0 > > > HIVE-14035 etc introduced a more efficient data layout for acid tables > Additional work is needed to support Merge for these tables > Need to make sure we generate unique ROW__IDs in each branch of the > multi-insert statement. StatementId was introduced in HIVE-11030 but it's > not surfaced from storage layer. It needs to be made part of ROW__ID to > ensure unique ROW__ID from concurrent writes from the same query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16832: -- Resolution: Fixed Fix Version/s: 3.0.0 Target Version/s: 3.0.0 (was: 3.0.0, 2.4.0) Status: Resolved (was: Patch Available) > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16979) Cache UGI for metastore
[ https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084860#comment-16084860 ] Tao Li commented on HIVE-16979: --- [~gopalv] Can you please take a look at the patch? Thanks! > Cache UGI for metastore > --- > > Key: HIVE-16979 > URL: https://issues.apache.org/jira/browse/HIVE-16979 > Project: Hive > Issue Type: Improvement >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch, > HIVE-16979.3.patch > > > FileSystem.closeAllForUGI is called per request against metastore to dispose > UGI, which involves talking to HDFS name node and is time consuming. So the > perf improvement would be caching and reusing the UGI. > Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency > against HDFS. Usually a Hive query could result in several calls against > metastore, so we can save up to 50-100 ms per hive query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084858#comment-16084858 ] Hive QA commented on HIVE-16832: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876914/HIVE-16832.22.patch {color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10888 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] (batchId=94) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5989/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876914 - PreCommit-HIVE-Build > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17079: - Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the review! > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084817#comment-16084817 ] Siddharth Seth commented on HIVE-16926: --- bq. Is there any action needed on this part? I don't thing there is, unless you see this as a problem for the running spark task. The number of threads created etc is quite small afaik. bq. Maybe I can just replace pendingClients/registeredClients with a single list and the RequestInfo can keep a state to show if the request is pending/running/etc. That'll work as well. Think there's still 2 places which have similar code related to heartbeats - heartbeat / nodePinged. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators
[ https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16100: --- Attachment: HIVE-16100.5.patch > Dynamic Sorted Partition optimizer loses sibling operators > -- > > Key: HIVE-16100 > URL: https://issues.apache.org/jira/browse/HIVE-16100 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.1, 2.1.1, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, > HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch, HIVE-16100.5.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173 > {code} > // unlink connection between FS and its parent > fsParent = fsOp.getParentOperators().get(0); > fsParent.getChildOperators().clear(); > {code} > The optimizer discards any cases where the fsParent has another SEL child -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084770#comment-16084770 ] Vineet Garg commented on HIVE-16793: It does if gby keys are constant > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COM
[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
[ https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084774#comment-16084774 ] liyunzhang_intel commented on HIVE-17018: - [~csun]: {quote} Yes. I think we don't need to change the existing behavior. I'm just suggesting that we might need a HoS specific config to replace hive.auto.convert.join.nonconditionaltask.size {quote} rename {{hive.auto.convert.join.nonconditionaltask.size}} to {{hive.auto.convert.join.within.sparktask.size}}? and the description of the configuration {noformat} is changed from the sum of size for n-1 of the tables/partitions for a n-way join is smaller than it {noformat} to {noformat} the sum of size for n-1 of the tables/partitions for a n-way join is smaller than it in 1 MapTask or ReduceTask {noformat} Can you give some suggestion? > Small table is converted to map join even the total size of small tables > exceeds the threshold(hive.auto.convert.join.noconditionaltask.size) > - > > Key: HIVE-17018 > URL: https://issues.apache.org/jira/browse/HIVE-17018 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt > > > we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it > means the sum of size for n-1 of the tables/partitions for a n-way join is > smaller than it, it will be converted to a map join. for example, A join B > join C join D join E. Big table is A(100M), small tables are > B(10M),C(10M),D(10M),E(10M). If we set > hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B > will be converted to map join but C will not be converted to map join. In my > understanding, because hive.auto.convert.join.noconditionaltask.size can only > contain E and D, so C and B should not be converted to map join. > Let's explain more why E can be converted to map join. > in current code, > [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364] > calculates all the mapjoins in the parent path and child path. The search > stops when encountering [UnionOperator or > ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381]. > Because C is not converted to map join because {{connectedMapJoinSize + > totalSize) > maxSize}} [see > code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The > RS before the join of C remains. When calculating whether B will be > converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering > [RS > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409] > and causes {{connectedMapJoinSize + totalSize) < maxSize}} matches. > [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not as you > are more familiar with SparkJoinOptimizer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084767#comment-16084767 ] Hive QA commented on HIVE-17079: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876908/HIVE-17079.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10874 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5988/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876908 - PreCommit-HIVE-Build > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084760#comment-16084760 ] Gopal V commented on HIVE-16793: Does enabling this optimization remove the cross-products triggered by the scalar sub-query? > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) >
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Attachment: HIVE-16793.5.patch Latest patch adds a config param {{hive.optimize.remove.sq_count_check}} to enable this optimization. Since this optimization caters to a very specific case but could have adverse effects (join reordering, joins not merging) we have decided to disable this optimization by default > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) >
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Status: Open (was: Patch Available) > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] > <-Select Operator
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Status: Patch Available (was: Open) > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] >
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084708#comment-16084708 ] Adam Szita commented on HIVE-8838: -- Thanks for reviewing [~spena], [~sushanth], [~aihuaxu] and committing! > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16732: -- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) HIVE-16732.03-branch-2.patch committed to branch-2 (2.x) thanks Wei for the review > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084644#comment-16084644 ] Hive QA commented on HIVE-16732: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876901/HIVE-16732.03-branch-2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10585 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5987/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876901 - PreCommit-HIVE-Build > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614 ] Pengcheng Xiong edited comment on HIVE-16907 at 7/12/17 8:18 PM: - That is exactly what I am worrying about : Hive may not well support table name with ".". Could u estimate the work that we need to do if we want to support this? Thanks. was (Author: pxiong): That is exactly what I am worrying about : Hive may not well support table name with ".". Could u evaluate the work that we need to do if we want to support this? Thanks. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614 ] Pengcheng Xiong commented on HIVE-16907: That is exactly what I am worrying about : Hive may not well support table name with ".". Could u evaluate the work that we need to do if we want to support this? Thanks. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator > | > | compressed: false >
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084585#comment-16084585 ] Sergio Peña commented on HIVE-8838: --- Aaa, that's what happened haha, I tried to push it when I got an error that I had to update my local repo, and when I updated it, I saw the patch was already there, then I got confused. Anyway, no worries, thanks for the heads up. > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084566#comment-16084566 ] Hive QA commented on HIVE-16922: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876876/HIVE-16922.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10874 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5986/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876876 - PreCommit-HIVE-Build > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084479#comment-16084479 ] Hive QA commented on HIVE-17072: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876874/HIVE-17072.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] (batchId=94) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5985/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876874 - PreCommit-HIVE-Build > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Attachments: HIVE-17072.1.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084462#comment-16084462 ] Sahil Takiar commented on HIVE-17078: - If we are printing the child stdout / stderr to the Hive logs then do we need to also print them to Hive stdout / stderr too? > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-12631: -- Attachment: HIVE-12631.17.patch > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.17.patch, HIVE-12631.1.patch, HIVE-12631.2.patch, > HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, > HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, > HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16821: --- Status: Open (was: Patch Available) Temporarily obsoleted by HIVE-17073 > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, > HIVE-16821.2.patch, HIVE-16821.3.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084438#comment-16084438 ] Gopal V commented on HIVE-17079: LGTM - +1 > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17079: - Attachment: HIVE-17079.1.patch [~sseth] can you please take a look? small change > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17079: - Status: Patch Available (was: Open) > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16832: -- Attachment: HIVE-16832.22.patch > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Fix Version/s: 3.0.0 > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Fix For: 3.0.0 > > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8838: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks [~szita] for your contribution. I committed to master. > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084425#comment-16084425 ] Vineet Garg commented on HIVE-17066: Failures are not reproducible/un-related > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Fix For: 3.0.0 > > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084434#comment-16084434 ] Sushanth Sowmyan commented on HIVE-8838: ([~spena], I just pushed to master too, hopefully our pushes don't conflict :D ) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084432#comment-16084432 ] Vineet Garg commented on HIVE-17066: Pushed to master > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-17079: > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8838: -- Issue Type: New Feature (was: Bug) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, > HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084360#comment-16084360 ] Eugene Koifman commented on HIVE-16177: --- HIVE-16177.20-branch-2.patch committed to branch-2 (2.x) > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, > HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16812) VectorizedOrcAcidRowBatchReader doesn't filter delete events
[ https://issues.apache.org/jira/browse/HIVE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16812: -- Priority: Critical (was: Major) > VectorizedOrcAcidRowBatchReader doesn't filter delete events > > > Key: HIVE-16812 > URL: https://issues.apache.org/jira/browse/HIVE-16812 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 2.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > > the c'tor of VectorizedOrcAcidRowBatchReader has > {noformat} > // Clone readerOptions for deleteEvents. > Reader.Options deleteEventReaderOptions = readerOptions.clone(); > // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX > because > // we always want to read all the delete delta files. > deleteEventReaderOptions.range(0, Long.MAX_VALUE); > {noformat} > This is suboptimal since base and deltas are sorted by ROW__ID. So for each > split if base we can find min/max ROW_ID and only load events from delta that > are in [min,max] range. This will reduce the number of delete events we load > in memory (to no more than there in the split). > When we support sorting on PK, the same should apply but we'd need to make > sure to store PKs in ORC index > See OrcRawRecordMerger.discoverKeyBounds() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084376#comment-16084376 ] Hive QA commented on HIVE-4577: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876871/HIVE-4577.6.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dfscmd] (batchId=33) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5984/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876871 - PreCommit-HIVE-Build > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16732: -- Attachment: HIVE-16732.03-branch-2.patch > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084287#comment-16084287 ] Hive QA commented on HIVE-17078: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876861/HIVE-17078.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226) org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5983/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876861 - PreCommit-HIVE-Build > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource
[ https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16999: -- Assignee: Bing Li > Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource > > > Key: HIVE-16999 > URL: https://issues.apache.org/jira/browse/HIVE-16999 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sailee Jain >Assignee: Bing Li >Priority: Critical > > Performance bottleneck is found in adding resource[which is lying on HDFS] to > the distributed cache. > Commands used are :- > {code:java} > 1. ADD ARCHIVE "hdfs://some_dir/archive.tar" > 2. ADD FILE "hdfs://some_dir/file.txt" > {code} > Here is the log corresponding to the archive adding operation:- > {noformat} > converting to local hdfs://some_dir/archive.tar > Added resources: [hdfs://some_dir/archive.tar > {noformat} > Hive is downloading the resource to the local filesystem [shown in log by > "converting to local"]. > {color:#d04437}Ideally there is no need to bring the file to the local > filesystem when this operation is all about copying the file from one > location on HDFS to other location on HDFS[distributed cache].{color} > This adds lot of performance bottleneck when the the resource is a big file > and all commands need the same resource. > After debugging around the impacted piece of code is found to be :- > {code:java} > public List add_resources(ResourceType t, Collection values, > boolean convertToUnix) > throws RuntimeException { > Set resourceSet = resourceMaps.getResourceSet(t); > Map> resourcePathMap = > resourceMaps.getResourcePathMap(t); > Map> reverseResourcePathMap = > resourceMaps.getReverseResourcePathMap(t); > List localized = new ArrayList(); > try { > for (String value : values) { > String key; > {color:#d04437}//get the local path of downloaded jars{color} > List downloadedURLs = resolveAndDownload(t, value, > convertToUnix); > ; > . > {code} > {code:java} > List resolveAndDownload(ResourceType t, String value, boolean > convertToUnix) throws URISyntaxException, > IOException { > URI uri = createURI(value); > if (getURLType(value).equals("file")) { > return Arrays.asList(uri); > } else if (getURLType(value).equals("ivy")) { > return dependencyResolver.downloadDependencies(uri); > } else { // goes here for HDFS > return Arrays.asList(createURI(downloadResource(value, > convertToUnix))); // Here when the resource is not local it will download it > to the local machine. > } > } > {code} > Here, the function resolveAndDownload() always calls the downloadResource() > api in case of external filesystem. It should take into consideration the > fact that - when the resource is on same HDFS then bringing it on local > machine is not a needed step and can be skipped for better performance. > Thanks, > Sailee -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084245#comment-16084245 ] Bing Li commented on HIVE-16907: [~pxiong] and [~lirui], thank you for your comments. I tried CREATE TABLE statement in MySQL, and found that it treats the `db.tbl` as the table name. And "dot" is allowed in the table name. e.g. {code:java} mysql> create table xxx (col int); mysql> create table test.yyy (col int); mysql> create table `test.zzz` (col int); mysql> create table `test.test.tbl` (col int); mysql> show tables; ++ | Tables_in_test | ++ | test.test.tbl | | test.zzz | | xxx| | yyy| ++ {code} Back to Hive, if we would like to make it having the same behavior as MySQL, we should change the logic of processing it. My previous patch is NOT enough and can't handle `db.db.tbl` neither. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0
[jira] [Updated] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled
[ https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-13384: --- Description: I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) But found that it can't new a HiveMetaStoreClient object successfully via a proxy user in Kerberos env. === 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) == When I debugging on Hive, I found that the error came from open() method in HiveMetaStoreClient class. Around line 406, transport = UserGroupInformation.getCurrentUser().doAs(new PrivilegedExceptionAction() { //FAILED, because the current user doesn't have the cridential But it will work if I change above line to transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new PrivilegedExceptionAction() { //PASS I found DRILL-3413 fixes this error in Drill side as a workaround. But if I submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when initialize the object via HCatalog. It would be better to fix this issue in Hive side. was: I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) But found that it can't new a HiveMetaStoreClient object successfully via a proxy using in Kerberos env. === 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) == When I debugging on Hive, I found that the error came from open() method in HiveMetaStoreClient class. Around line 406, transport = UserGroupInformation.getCurrentUser().doAs(new PrivilegedExceptionAction() { //FAILED, because the current user doesn't have the cridential But it will work if I change above line to transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new PrivilegedExceptionAction() { //PASS I found DRILL-3413 fixes this error in Drill side as a workaround. But if I submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when initialize the object via HCatalog. It would be better to fix this issue in Hive side. > Failed to create HiveMetaStoreClient object with proxy user when Kerberos > enabled > - > > Key: HIVE-13384 > URL: https://issues.apache.org/jira/browse/HIVE-13384 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > > I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) > But found that it can't new a HiveMetaStoreClient object successfully via a > proxy user in Kerberos env. > === > 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > == > When I debugging on Hive, I found that the error came from open() method in > HiveMetaStoreClient class. > Around line 406, > transport = UserGroupInformation.getCurrentUser().doAs(new > PrivilegedExceptionAction() { //FAILED, because the current user > doesn't have the cridential > But it will work if I change above line to > transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new > PrivilegedExceptionAction() { //PASS > I found DRILL-3413 fixes this error in Drill side as a workaround. But if I > submit a mapreduce job via Pig/HCatalog, it runs into the same issue again > when initialize the object
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084226#comment-16084226 ] Adam Szita commented on HIVE-8838: -- Test results above are irrelevant again - I think this is ready for commit > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084197#comment-16084197 ] Hive QA commented on HIVE-8838: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876860/HIVE-8838.4.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10873 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5982/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876860 - PreCommit-HIVE-Build > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17073: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Fixed TestVectorSelectOperator and pushed to master, thanks for reviewing [~mmccline]! > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 3.0.0 > > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.03.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-17072: - Status: Patch Available (was: Open) > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Attachments: HIVE-17072.1.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084183#comment-16084183 ] Bing Li commented on HIVE-16922: Thank you, [~lirui]. Seems that the result page has been expired. Just re-submitted the patch to check. > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16922: --- Attachment: HIVE-16922.2.patch > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16922: --- Attachment: (was: HIVE-16922.2.patch) > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084174#comment-16084174 ] Bing Li commented on HIVE-4577: --- Thank you, [~vgumashta]. I could reproduce TestPerfCliDriver [query14] in my env, and update its golden file. The failure of TestMiniLlapLocalCliDriver[vector_if_expr] and TestBeeLineDriver[materialized_view_create_rewrite] should not caused by this patch. > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)