[jira] [Commented] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation
[ https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349915#comment-16349915 ] Gopal V commented on HIVE-18611: LGTM - +1 tests pending. > Avoid memory allocation of aggregation buffer during stats computation > --- > > Key: HIVE-18611 > URL: https://issues.apache.org/jira/browse/HIVE-18611 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Statistics >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan >Priority: Major > Labels: performance > Attachments: HIVE-18611.patch > > > Bloom filter aggregation buffer may result in allocation of upto ~594MB array > which is unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18546) Remove unnecessary code introduced in HIVE-14498
[ https://issues.apache.org/jira/browse/HIVE-18546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349914#comment-16349914 ] Ashutosh Chauhan commented on HIVE-18546: - +1 > Remove unnecessary code introduced in HIVE-14498 > > > Key: HIVE-18546 > URL: https://issues.apache.org/jira/browse/HIVE-18546 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18546.02.patch, HIVE-18546.03.patch, > HIVE-18546.04.patch > > > HIVE-14498 introduced some code to check the invalidation of materialized > views that can be simplified, relying instead on existing transaction ids. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation
[ https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-18611: --- Affects Version/s: 3.0.0 2.3.2 > Avoid memory allocation of aggregation buffer during stats computation > --- > > Key: HIVE-18611 > URL: https://issues.apache.org/jira/browse/HIVE-18611 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Statistics >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan >Priority: Major > Labels: performance > Attachments: HIVE-18611.patch > > > Bloom filter aggregation buffer may result in allocation of upto ~594MB array > which is unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation
[ https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-18611: --- Labels: performance (was: ) > Avoid memory allocation of aggregation buffer during stats computation > --- > > Key: HIVE-18611 > URL: https://issues.apache.org/jira/browse/HIVE-18611 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Statistics >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan >Priority: Major > Labels: performance > Attachments: HIVE-18611.patch > > > Bloom filter aggregation buffer may result in allocation of upto ~594MB array > which is unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation
[ https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-18611: Attachment: HIVE-18611.patch > Avoid memory allocation of aggregation buffer during stats computation > --- > > Key: HIVE-18611 > URL: https://issues.apache.org/jira/browse/HIVE-18611 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Statistics >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan >Priority: Major > Attachments: HIVE-18611.patch > > > Bloom filter aggregation buffer may result in allocation of upto ~594MB array > which is unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation
[ https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-18611: Status: Patch Available (was: Open) > Avoid memory allocation of aggregation buffer during stats computation > --- > > Key: HIVE-18611 > URL: https://issues.apache.org/jira/browse/HIVE-18611 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Statistics >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan >Priority: Major > Attachments: HIVE-18611.patch > > > Bloom filter aggregation buffer may result in allocation of upto ~594MB array > which is unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation
[ https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned HIVE-18611: --- > Avoid memory allocation of aggregation buffer during stats computation > --- > > Key: HIVE-18611 > URL: https://issues.apache.org/jira/browse/HIVE-18611 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Statistics >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan >Priority: Major > > Bloom filter aggregation buffer may result in allocation of upto ~594MB array > which is unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data
[ https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349888#comment-16349888 ] Hive QA commented on HIVE-18599: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s{color} | {color:red} ql: The patch generated 7 new + 632 unchanged - 1 fixed = 639 total (was 633) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8983/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8983/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write > data > - > > Key: HIVE-18599 > URL: https://issues.apache.org/jira/browse/HIVE-18599 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Steve Yeom >Assignee: Steve Yeom >Priority: Major > Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch, > HIVE-18599.03.patch > > > CTTAS on temporary micromanaged table does not write data. > I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below > script: > > set hive.mapred.mode=nonstrict; > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set tez.grouping.min-size=1; > set tez.grouping.max-size=2; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > > drop table intermediate; > create table intermediate(key int) partitioned by (p int) stored as orc; > insert into table intermediate partition(p='455') select distinct key from > src where key >= 0 order by key desc limit 2; > insert into table intermediate partition(p='456') select distinct key from > src where key is not null order by key asc limit 2; > insert into table intermediate partition(p='457') select distinct key from > src where key >= 100 order by key asc limit 2; > > drop table ctas0_mm; > explain create temporary table ctas0_mm tblproperties > ("transactional"="true", "transactional_properties"="insert_only") as select > * from intermediate; > cre
[jira] [Updated] (HIVE-18610) Performance: ListKeyWrapper does not check for hashcode equals, before comparing members
[ https://issues.apache.org/jira/browse/HIVE-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-18610: --- Attachment: HIVE-18610.1.patch > Performance: ListKeyWrapper does not check for hashcode equals, before > comparing members > > > Key: HIVE-18610 > URL: https://issues.apache.org/jira/browse/HIVE-18610 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: HIVE-18610.1.patch > > > ListKeyWrapper::equals() > {code} > @Override > public boolean equals(Object obj) { > if (!(obj instanceof ListKeyWrapper)) { > return false; > } > Object[] copied_in_hashmap = ((ListKeyWrapper) obj).keys; > return equalComparer.areEqual(copied_in_hashmap, keys); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18610) Performance: ListKeyWrapper does not check for hashcode equals, before comparing members
[ https://issues.apache.org/jira/browse/HIVE-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-18610: --- Status: Patch Available (was: Open) > Performance: ListKeyWrapper does not check for hashcode equals, before > comparing members > > > Key: HIVE-18610 > URL: https://issues.apache.org/jira/browse/HIVE-18610 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: HIVE-18610.1.patch > > > ListKeyWrapper::equals() > {code} > @Override > public boolean equals(Object obj) { > if (!(obj instanceof ListKeyWrapper)) { > return false; > } > Object[] copied_in_hashmap = ((ListKeyWrapper) obj).keys; > return equalComparer.areEqual(copied_in_hashmap, keys); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18610) Performance: ListKeyWrapper does not check for hashcode equals, before comparing members
[ https://issues.apache.org/jira/browse/HIVE-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-18610: -- Assignee: Gopal V > Performance: ListKeyWrapper does not check for hashcode equals, before > comparing members > > > Key: HIVE-18610 > URL: https://issues.apache.org/jira/browse/HIVE-18610 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > > ListKeyWrapper::equals() > {code} > @Override > public boolean equals(Object obj) { > if (!(obj instanceof ListKeyWrapper)) { > return false; > } > Object[] copied_in_hashmap = ((ListKeyWrapper) obj).keys; > return equalComparer.areEqual(copied_in_hashmap, keys); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349878#comment-16349878 ] ASF GitHub Bot commented on HIVE-18581: --- Github user anishek closed the pull request at: https://github.com/apache/hive/pull/304 > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349875#comment-16349875 ] Hive QA commented on HIVE-18606: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908896/HIVE-18606.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 12967 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.metastore.TestMarkPartition.testMarkingPartitionSet (batchId=215) org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullStorageDescriptorInNew[Embedded] (batchId=206) org.apache.hadoop.hive.metastore.client.TestTablesList.testListTableNamesByFilterNullDatabase[Embedded] (batchId=206) org.apache.hadoop.hive.ql.TestTxnNoBuckets.testCtasEmpty (batchId=280) org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testCtasEmpty (batchId=280) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8982/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8982/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8982/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 26 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12908896 - PreCommit-HIVE-Build > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up
[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349863#comment-16349863 ] Ke Jia commented on HIVE-17139: --- [~mmccline], [~Ferd],[~colinma], I update the patch to fix HIVE-18524 issue and upload to RB . Please help me review. Thanks for your help. > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-17139.1.patch, HIVE-17139.10.patch, > HIVE-17139.11.patch, HIVE-17139.12.patch, HIVE-17139.13.patch, > HIVE-17139.13.patch, HIVE-17139.14.patch, HIVE-17139.15.patch, > HIVE-17139.16.patch, HIVE-17139.17.patch, HIVE-17139.18.patch, > HIVE-17139.18.patch, HIVE-17139.19.patch, HIVE-17139.2.patch, > HIVE-17139.20.patch, HIVE-17139.3.patch, HIVE-17139.4.patch, > HIVE-17139.5.patch, HIVE-17139.6.patch, HIVE-17139.7.patch, > HIVE-17139.8.patch, HIVE-17139.9.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18442) HoS: No FileSystem for scheme: nullscan
[ https://issues.apache.org/jira/browse/HIVE-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-18442: -- Attachment: HIVE-18442.2.patch > HoS: No FileSystem for scheme: nullscan > --- > > Key: HIVE-18442 > URL: https://issues.apache.org/jira/browse/HIVE-18442 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > Attachments: HIVE-18442.1.patch, HIVE-18442.2.patch > > > Hit the issue when I run following query in yarn-cluster mode: > {code} > select * from (select key from src where false) a left outer join (select key > from srcpart limit 0) b on a.key=b.key; > {code} > Stack trace: > {noformat} > Job failed with java.io.IOException: No FileSystem for scheme: nullscan > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) > at > org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2605) > at > org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2601) > at > org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3409) > at > org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3347) > at > org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.cloneJobConf(SparkPlanGenerator.java:299) > at > org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:222) > at > org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:109) > at > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:354) > at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:358) > at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-18581: --- Status: Open (was: Patch Available) > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek resolved HIVE-18581. Resolution: Won't Fix > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349844#comment-16349844 ] anishek commented on HIVE-18581: the replication sub system is case insensitive and hence does not require these changes, this was primarily for external systems using repl events, however there is no requirement / dependency there as of now, so pausing this effort. > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349825#comment-16349825 ] Hive QA commented on HIVE-18606: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 1 new + 322 unchanged - 0 fixed = 323 total (was 322) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8982/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8982/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.h
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349813#comment-16349813 ] Hive QA commented on HIVE-18410: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908852/HIVE-18410_2.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 12967 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby_empty] (batchId=82) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8981/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 21 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12908852 - PreCommit-HIVE-Build > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.0.0, 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18573) Use proper Calcite operator instead of UDFs
[ https://issues.apache.org/jira/browse/HIVE-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-18573: -- Attachment: HIVE-18573.6.patch > Use proper Calcite operator instead of UDFs > --- > > Key: HIVE-18573 > URL: https://issues.apache.org/jira/browse/HIVE-18573 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: slim bouguerra >Priority: Major > Attachments: HIVE-18573.2.patch, HIVE-18573.3.patch, > HIVE-18573.4.patch, HIVE-18573.5.patch, HIVE-18573.6.patch, HIVE-18573.patch > > > Currently, Hive is mostly using user-defined black box sql operators during > Query planning. It will be more beneficial to use proper calcite operators. > Also, Use a single name for Extract operator instead of a different name for > every Unit, > Same for Floor function. This will allow unifying the treatment per operator. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349791#comment-16349791 ] Hive QA commented on HIVE-18410: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 35s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | modules | C: . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8981/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.0.0, 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18573) Use proper Calcite operator instead of UDFs
[ https://issues.apache.org/jira/browse/HIVE-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-18573: -- Attachment: HIVE-18573.5.patch > Use proper Calcite operator instead of UDFs > --- > > Key: HIVE-18573 > URL: https://issues.apache.org/jira/browse/HIVE-18573 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: slim bouguerra >Priority: Major > Attachments: HIVE-18573.2.patch, HIVE-18573.3.patch, > HIVE-18573.4.patch, HIVE-18573.5.patch, HIVE-18573.patch > > > Currently, Hive is mostly using user-defined black box sql operators during > Query planning. It will be more beneficial to use proper calcite operators. > Also, Use a single name for Extract operator instead of a different name for > every Unit, > Same for Floor function. This will allow unifying the treatment per operator. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia reopened HIVE-17139: --- > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-17139.1.patch, HIVE-17139.10.patch, > HIVE-17139.11.patch, HIVE-17139.12.patch, HIVE-17139.13.patch, > HIVE-17139.13.patch, HIVE-17139.14.patch, HIVE-17139.15.patch, > HIVE-17139.16.patch, HIVE-17139.17.patch, HIVE-17139.18.patch, > HIVE-17139.18.patch, HIVE-17139.19.patch, HIVE-17139.2.patch, > HIVE-17139.20.patch, HIVE-17139.3.patch, HIVE-17139.4.patch, > HIVE-17139.5.patch, HIVE-17139.6.patch, HIVE-17139.7.patch, > HIVE-17139.8.patch, HIVE-17139.9.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349763#comment-16349763 ] Hive QA commented on HIVE-17935: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908851/HIVE-17935.8.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 92 failed/errored test(s), 12965 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_part] (batchId=16) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_1] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2] (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] (batchId=14) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_partitioned] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_partial] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[implicit_cast_during_insert] (batchId=51) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_into6] (batchId=72) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part10] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part14] (batchId=90) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part1] (batchId=85) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part3] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part4] (batchId=63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part8] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part9] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge3] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge4] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition3] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition4] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition5] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge2] (batchId=91) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats2] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats4] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_empty_dyn_part] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_15] (batchId=87) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_16] (batchId=74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_17] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_18] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_25] (batchId=89) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_partitioned] (batchId=52) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_stats] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge2] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_partitioned] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_mm] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_non_mm] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[extrapolate_part_stats_partial_ndv] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] (batchId=163) org.apa
[jira] [Updated] (HIVE-18595) UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone
[ https://issues.apache.org/jira/browse/HIVE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-18595: -- Status: Patch Available (was: Open) > UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone > > > Key: HIVE-18595 > URL: https://issues.apache.org/jira/browse/HIVE-18595 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Priority: Major > Attachments: HIVE-18595.patch > > > {code} > 2018-01-31T12:59:45,464 ERROR [10e97c86-7f90-406b-a8fa-38be5d3529cc main] > ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:456 Wrong > arguments ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only > string/date/timestamp types > org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:456 Wrong arguments > ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only > string/date/timestamp types > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235) > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11780) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBLogicalPlan(CalcitePlanner.java:3140) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4330) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173) > at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) > at > org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho
[jira] [Updated] (HIVE-18595) UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone
[ https://issues.apache.org/jira/browse/HIVE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-18595: -- Attachment: HIVE-18595.patch > UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone > > > Key: HIVE-18595 > URL: https://issues.apache.org/jira/browse/HIVE-18595 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Priority: Major > Attachments: HIVE-18595.patch > > > {code} > 2018-01-31T12:59:45,464 ERROR [10e97c86-7f90-406b-a8fa-38be5d3529cc main] > ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:456 Wrong > arguments ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only > string/date/timestamp types > org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:456 Wrong arguments > ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only > string/date/timestamp types > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235) > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11780) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBLogicalPlan(CalcitePlanner.java:3140) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4330) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173) > at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) > at > org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
[jira] [Updated] (HIVE-18573) Use proper Calcite operator instead of UDFs
[ https://issues.apache.org/jira/browse/HIVE-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-18573: -- Attachment: HIVE-18573.4.patch > Use proper Calcite operator instead of UDFs > --- > > Key: HIVE-18573 > URL: https://issues.apache.org/jira/browse/HIVE-18573 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: slim bouguerra >Priority: Major > Attachments: HIVE-18573.2.patch, HIVE-18573.3.patch, > HIVE-18573.4.patch, HIVE-18573.patch > > > Currently, Hive is mostly using user-defined black box sql operators during > Query planning. It will be more beneficial to use proper calcite operators. > Also, Use a single name for Extract operator instead of a different name for > every Unit, > Same for Floor function. This will allow unifying the treatment per operator. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
[ https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349723#comment-16349723 ] Hive QA commented on HIVE-17935: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | modules | C: common ql itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8980/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Turn on hive.optimize.sort.dynamic.partition by default > --- > > Key: HIVE-17935 > URL: https://issues.apache.org/jira/browse/HIVE-17935 > Project: Hive > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, > HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch, > HIVE-17935.6.patch, HIVE-17935.7.patch, HIVE-17935.8.patch > > > The config option hive.optimize.sort.dynamic.partition is an optimization for > Hive’s dynamic partitioning feature. It was originally implemented in > [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this > optimization, the dynamic partition columns and bucketing columns (in case of > bucketed tables) are sorted before being fed to the reducers. Since the > partitioning and bucketing columns are sorted, each reducer can keep only one > record writer open at any time thereby reducing the memory pressure on the > reducers. There were some early problems with this optimization and it was > disabled by default in HiveConf in > [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then > setting hive.optimize.sort.dynamic.partition=true has been used to solve > problems where dynamic partitioning produces with (1) too many small files on > HDFS, which is bad for the cluster and can increase overhead for future Hive > queries over those partitions, and (2) O
[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables
[ https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349717#comment-16349717 ] Deepak Jaiswal commented on HIVE-18516: --- The test failures are unrelated. > load data should rename files consistent with insert statements for ACID > Tables > --- > > Key: HIVE-18516 > URL: https://issues.apache.org/jira/browse/HIVE-18516 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, > HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, > HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, > HIVE-18516.8.patch, HIVE-18516.9.patch > > > h1. load data should rename files consistent with insert statements for ACID > Tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349716#comment-16349716 ] Xuefu Zhang edited comment on HIVE-18513 at 2/2/18 3:16 AM: Thanks, [~jcamachorodriguez] and [~jdere]. I went thru the doc and is still unclear what mechanism was decided to invalidate cache. Is this doc still work in progress? was (Author: xuefuz): Thanks, [~jcamachorodriguez] and [~jdere]. I went thru the doc and is clear what mechanism was decided to invalidate cache. Is this doc still work in progress? > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349716#comment-16349716 ] Xuefu Zhang commented on HIVE-18513: Thanks, [~jcamachorodriguez] and [~jdere]. I went thru the doc and is clear what mechanism was decided to invalidate cache. Is this doc still work in progress? > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables
[ https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349707#comment-16349707 ] Hive QA commented on HIVE-18516: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908848/HIVE-18516.10.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 12965 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status_disable_bitvector] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.metastore.client.TestGetTableMeta.testGetTableMetaNullOrEmptyDb[Embedded] (batchId=206) org.apache.hadoop.hive.metastore.client.TestTablesGetExists.testGetAllTablesCaseInsensitive[Embedded] (batchId=206) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8979/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8979/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8979/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 23 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12908848 - PreCommit-HIVE-Build > load data should rename files consistent with insert statements for ACID > Tables > --- > > Key: HIVE-18516 > URL: https://issues.apache.org/jira/browse/HIVE-18516 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, > HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, > HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, > HIVE-18516.8.patch, HIVE-18516.9.patch > > > h1. load data should rename files consistent with insert statements for ACID > Tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349704#comment-16349704 ] Jesus Camacho Rodriguez commented on HIVE-18513: [~xuefuz], [~jdere] already posted the link to the doc above: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75963441 > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349703#comment-16349703 ] Xuefu Zhang commented on HIVE-18513: Could we have the high-level doc linked here so others interested can do a high-level review at least? Thanks. > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349685#comment-16349685 ] Jesus Camacho Rodriguez commented on HIVE-18513: +1 (pending tests) > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-18513: -- Attachment: HIVE-18513.5.patch > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-18513: -- Attachment: (was: HIVE-18513.5.patch) > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349677#comment-16349677 ] Jason Dere commented on HIVE-18513: --- patch didn't include the additional changes, re-attaching > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-18513: -- Attachment: HIVE-18513.5.patch > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-18513: -- Attachment: (was: HIVE-18513.5.patch) > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables
[ https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349665#comment-16349665 ] Hive QA commented on HIVE-18516: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 10 new + 340 unchanged - 14 fixed = 350 total (was 354) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8979/yetus/diff-checkstyle-ql.txt | | modules | C: ql itests U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8979/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > load data should rename files consistent with insert statements for ACID > Tables > --- > > Key: HIVE-18516 > URL: https://issues.apache.org/jira/browse/HIVE-18516 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, > HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, > HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, > HIVE-18516.8.patch, HIVE-18516.9.patch > > > h1. load data should rename files consistent with insert statements for ACID > Tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-18608: Status: Patch Available (was: Open) (Submitting for tests.) > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660 ] Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM: - I've attached an initial implementation, where dictionary encoding might be disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}. Note: I've only added support for top-level columns. Specifying this on a {{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all members of the STRUCT), recursively. It might be good to support selection at an arbitrary depth, at a later date. E.g. {{myInfoArray.__elem__.emailBody}}. was (Author: mithun): I've attached an initial implementation, where dictionary encoding might be disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}. Note: I've only added support for top-level columns. Specifying this on a {{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all members of the STRUCT), recursively. It might be good to support selection at an arbitrary depth. E.g. {{myInfoArray.__elem__.emailBody}}. > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660 ] Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM: - I've attached an initial implementation, where dictionary encoding might be disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}). Note: I've only added support for top-level columns. Specifying this on a {{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all members of the STRUCT), recursively. It might be good to support selection at an arbitrary depth, at a later date. E.g. {{myInfoArray.__elem__.emailBody}}. was (Author: mithun): I've attached an initial implementation, where dictionary encoding might be disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}. Note: I've only added support for top-level columns. Specifying this on a {{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all members of the STRUCT), recursively. It might be good to support selection at an arbitrary depth, at a later date. E.g. {{myInfoArray.__elem__.emailBody}}. > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660 ] Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM: - I've attached an initial implementation, where dictionary encoding might be disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}). Note: I've only added support for top-level columns. Specifying this on a {{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all members of the {{STRUCT}}, recursively). It might be good to support selection at an arbitrary depth, at a later date. E.g. {{myInfoArray.__elem__.emailBody}}. was (Author: mithun): I've attached an initial implementation, where dictionary encoding might be disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}). Note: I've only added support for top-level columns. Specifying this on a {{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all members of the STRUCT), recursively. It might be good to support selection at an arbitrary depth, at a later date. E.g. {{myInfoArray.__elem__.emailBody}}. > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660 ] Mithun Radhakrishnan commented on HIVE-18608: - I've attached an initial implementation, where dictionary encoding might be disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}. Note: I've only added support for top-level columns. Specifying this on a {{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all members of the STRUCT), recursively. It might be good to support selection at an arbitrary depth. E.g. {{myInfoArray.__elem__.emailBody}}. > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-18608: Attachment: (was: HIVE-18608.1-branch-2.2.patch) > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-18608: Attachment: HIVE-18608.1-branch-2.2.patch > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-18608: Attachment: HIVE-18608.1-branch-2.2.patch > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Attachments: HIVE-18608.1-branch-2.2.patch > > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns
[ https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-18608: --- > ORC should allow selectively disabling dictionary-encoding on specified > columns > --- > > Key: HIVE-18608 > URL: https://issues.apache.org/jira/browse/HIVE-18608 > Project: Hive > Issue Type: New Feature > Components: ORC >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > > Just as ORC allows the choice of columns to enable bloom-filters on, it would > be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding > should be disabled on. > Currently, the choice of dictionary-encoding depends on the results of > sampling the first row-stride within a stripe. If the user knows that a > column's cardinality is bound to prevent an effective dictionary, she might > choose to simply disable it on just that column, and avoid the cost of > sampling in the first row-stride. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-18513: -- Attachment: HIVE-18513.5.patch > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18513) Query results caching
[ https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349642#comment-16349642 ] Jason Dere commented on HIVE-18513: --- Additional changes per review from [~jcamachorodriguez]. > Query results caching > - > > Key: HIVE-18513 > URL: https://issues.apache.org/jira/browse/HIVE-18513 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, > HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch > > > Add a query results cache that can save the results of an executed Hive query > for reuse on subsequent queries. This may be useful in cases where the same > query is issued many times, since Hive can return back the results of a > cached query rather than having to execute the full query on the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables
[ https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349639#comment-16349639 ] Sergey Shelukhin commented on HIVE-18575: - [~ekoifman] can you take a look at this one? :) > ACID properties usage in jobconf is ambiguous for MM tables > --- > > Key: HIVE-18575 > URL: https://issues.apache.org/jira/browse/HIVE-18575 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HIVE-18575.patch > > > Vectorization checks for ACID table trigger for MM tables where they don't > apply. Other places seem to set the setting for transactional case while most > of the code seems to assume it implies full acid. > Overall, many places in the code use the settings directly or set the ACID > flag without setting the ACID properties. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349637#comment-16349637 ] Sergey Shelukhin commented on HIVE-18606: - nit: else should be on the same line according to Hive coding standard :) +1 > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore
[ https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349635#comment-16349635 ] Hive QA commented on HIVE-18588: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908876/HIVE-18588.2.patch {color:green}SUCCESS:{color} +1 due to 70 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 51 failed/errored test(s), 11450 tests executed *Failed tests:* {noformat} TestAddAlterDropIndexes - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestAddPartitions - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestAddPartitionsFromPartSpec - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestAlterPartitions - did not produce a TEST-*.xml file (likely timed out) (batchId=213) TestAppendPartitions - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestCachedStore - did not produce a TEST-*.xml file (likely timed out) (batchId=213) TestDatabases - did not produce a TEST-*.xml file (likely timed out) (batchId=213) TestDropPartitions - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestFunctions - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestGetListIndexes - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestGetPartitions - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestGetTableMeta - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestHiveMetaStorePartitionSpecs - did not produce a TEST-*.xml file (likely timed out) (batchId=213) TestHiveMetaStoreTimeout - did not produce a TEST-*.xml file (likely timed out) (batchId=215) TestListPartitions - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestMarkPartition - did not produce a TEST-*.xml file (likely timed out) (batchId=215) TestMarkPartitionRemote - did not produce a TEST-*.xml file (likely timed out) (batchId=215) TestMetaStoreInitListener - did not produce a TEST-*.xml file (likely timed out) (batchId=215) TestObjectStoreInitRetry - did not produce a TEST-*.xml file (likely timed out) (batchId=213) TestRawStoreProxy - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestRemoteHiveMetaStore - did not produce a TEST-*.xml file (likely timed out) (batchId=209) TestRemoteHiveMetaStoreIpAddress - did not produce a TEST-*.xml file (likely timed out) (batchId=204) TestRemoteUGIHiveMetaStoreIpAddress - did not produce a TEST-*.xml file (likely timed out) (batchId=212) TestRetriesInRetryingHMSHandler - did not produce a TEST-*.xml file (likely timed out) (batchId=215) TestRetryingHMSHandler - did not produce a TEST-*.xml file (likely timed out) (batchId=213) TestSetUGIOnBothClientServer - did not produce a TEST-*.xml file (likely timed out) (batchId=207) TestSetUGIOnOnlyClient - did not produce a TEST-*.xml file (likely timed out) (batchId=205) TestSetUGIOnOnlyServer - did not produce a TEST-*.xml file (likely timed out) (batchId=214) TestTablesCreateDropAlterTruncate - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestTablesGetExists - did not produce a TEST-*.xml file (likely timed out) (batchId=206) TestTablesList - did not produce a TEST-*.xml file (likely timed out) (batchId=206) org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_bmj_schema_evolution] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hi
[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349633#comment-16349633 ] Eugene Koifman commented on HIVE-18606: --- yes, you are right. fixed in patch2 > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18606: -- Attachment: HIVE-18606.02.patch > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349629#comment-16349629 ] Sergey Shelukhin commented on HIVE-18606: - {noformat} +if(srcs != null) { + LOG.debug("No files found to move from " + sourcePath + " to " + targetPath); {noformat} Seems like this debug statement should be in the else. Looks good otherwise > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349627#comment-16349627 ] Eugene Koifman commented on HIVE-18606: --- [~sershe] could you review please > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349622#comment-16349622 ] Alexander Kolbasov commented on HIVE-16886: --- [~anishek] I am fine with updating branch 3 as well. Do you want to do it yourself or you'd like me to do it? I think we need to do the following: * Make sure you are Ok with the suggested fix * Revert the original fix * Apply new fix. So this will be two commits on branch-3 and one commit on branch-2. Please4 let me know what you think about it. > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Affects Versions: 3.0.0, 2.3.2, 2.3.3 >Reporter: Sergio Peña >Assignee: anishek >Priority: Major > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16886.1.patch, HIVE-16886.2.patch, > HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch, > HIVE-16886.6.patch, HIVE-16886.7.patch, HIVE-16886.8.patch, > datastore-identity-holes.diff > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore
[ https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349600#comment-16349600 ] Hive QA commented on HIVE-18588: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s{color} | {color:red} standalone-metastore: The patch generated 36 new + 437 unchanged - 32 fixed = 473 total (was 469) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc xml compile findbugs checkstyle | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8978/yetus/diff-checkstyle-standalone-metastore.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8978/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Add 'checkin' profile that runs slower tests in standalone-metastore > > > Key: HIVE-18588 > URL: https://issues.apache.org/jira/browse/HIVE-18588 > Project: Hive > Issue Type: Test > Components: Standalone Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Major > Attachments: HIVE-18588.2.patch, HIVE-18588.patch > > > Runtime for unit tests in standalone-metastore are now exceeding 25 minutes. > Ideally unit tests should finish within 2-3 minutes so users will run them > frequently. To solve this I propose to carve off many of the slower tests to > run in a new 'checkin' profile. This profile should be run before checkin > and by the ptest infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-18350: -- Attachment: HIVE-18350.10.patch > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, > HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, > HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, > HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-18350: -- Attachment: (was: HIVE-18350.10.patch) > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, > HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, > HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18600) Vectorization: Top-Level Vector Expression Scratch Column Deallocation
[ https://issues.apache.org/jira/browse/HIVE-18600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349579#comment-16349579 ] Hive QA commented on HIVE-18600: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908816/HIVE-18600.01.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 12965 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_10] (batchId=23) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_11] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_13] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_14] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_17] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_2] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_3] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_7] (batchId=85) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_8] (batchId=14) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_div0] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part_project] (batchId=157) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=179) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_inner_join] (batchId=181) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join0] (batchId=181) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1] (batchId=180) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2] (batchId=179) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join3] (batchId=180) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join4] (batchId=182) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join5] (batchId=182) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_case] (batchId=132) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testArithmeticExpressionVectorization (batchId=283) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.hcatalog.common.TestHiveClientCache.testCacheExpiry (batchId=200) org.apache.hive.hcatalog.common.TestHiveClientCache.testCacheHit (batchId=200) org.apache.hive.hcatalog.common.TestHiveClientCache.testCacheMiss (batchId=200) org.apache.hive.hcatalog.common.TestHiveClientCache.testCloseAllClients (batchId=200) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8977/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8977/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8977/ Messages: {noformat} Executing org.apache.hive.ptest.execut
[jira] [Updated] (HIVE-18607) HBase HFile write does strange things
[ https://issues.apache.org/jira/browse/HIVE-18607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18607: Attachment: HIVE-18607.patch > HBase HFile write does strange things > - > > Key: HIVE-18607 > URL: https://issues.apache.org/jira/browse/HIVE-18607 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HIVE-18607.patch > > > I cannot get the HBaseCliDriver to run locally, so first I'll use HiveQA to > check smth. > There's some strange code in the output handler that changes output directory > into a file because Hive supposedly wants that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-18350: -- Attachment: HIVE-18350.10.patch > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, > HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, > HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, > HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18607) HBase HFile write does strange things
[ https://issues.apache.org/jira/browse/HIVE-18607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-18607: Status: Patch Available (was: Open) > HBase HFile write does strange things > - > > Key: HIVE-18607 > URL: https://issues.apache.org/jira/browse/HIVE-18607 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HIVE-18607.patch > > > I cannot get the HBaseCliDriver to run locally, so first I'll use HiveQA to > check smth. > There's some strange code in the output handler that changes output directory > into a file because Hive supposedly wants that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-18350: -- Attachment: (was: HIVE-18350.10.patch) > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, > HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, > HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, > HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18607) HBase HFile write does strange things
[ https://issues.apache.org/jira/browse/HIVE-18607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-18607: --- > HBase HFile write does strange things > - > > Key: HIVE-18607 > URL: https://issues.apache.org/jira/browse/HIVE-18607 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > > I cannot get the HBaseCliDriver to run locally, so first I'll use HiveQA to > check smth. > There's some strange code in the output handler that changes output directory > into a file because Hive supposedly wants that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349540#comment-16349540 ] Deepak Jaiswal commented on HIVE-18350: --- Updated the patch based on Jason's comments. Adding [~thejas] to review. > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, > HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, > HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, > HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-18350: -- Attachment: HIVE-18350.10.patch > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, > HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, > HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, > HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-17751) Separate HMS Client and HMS server into separate sub-modules
[ https://issues.apache.org/jira/browse/HIVE-17751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349526#comment-16349526 ] Alexander Kolbasov commented on HIVE-17751: --- [~alangates] [~thejas] Would you be able to review the changes? > Separate HMS Client and HMS server into separate sub-modules > > > Key: HIVE-17751 > URL: https://issues.apache.org/jira/browse/HIVE-17751 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Vihang Karajgaonkar >Assignee: Alexander Kolbasov >Priority: Major > Attachments: HIVE-17751.06-standalone-metastore.patch > > > external applications which are interfacing with HMS should ideally only > include HMSClient library instead of one big library containing server as > well. We should ideally have a thin client library so that cross version > support for external applications is easier. We should sub-divide the > standalone module into possibly 3 modules (one for common classes, one for > client classes and one for server) or 2 sub-modules (one for client and one > for server) so that we can generate separate jars for HMS client and server. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18600) Vectorization: Top-Level Vector Expression Scratch Column Deallocation
[ https://issues.apache.org/jira/browse/HIVE-18600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349497#comment-16349497 ] Hive QA commented on HIVE-18600: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 43s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s{color} | {color:red} ql: The patch generated 1 new + 878 unchanged - 1 fixed = 879 total (was 879) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8977/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8977/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Vectorization: Top-Level Vector Expression Scratch Column Deallocation > -- > > Key: HIVE-18600 > URL: https://issues.apache.org/jira/browse/HIVE-18600 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-18600.01.patch > > > The operators create various vector expression *arrays* for predicates, > SELECT clauses, key expressions, etc. We could have those be marked as > special "top level" vector expression then we could defer deallocation until > the top level expression is complete. This could be a simple solution that > avoids trying fix our current eager deallocation that tries to reuse scratch > columns as soon as possible. It *isn't optimal*, but it *shouldn't be too > bad*. This solution is much better than not deallocating at all - especially > for queries that SELECT a large number of columns or have a lot of > expressions in the operator tree. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18567) ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly
[ https://issues.apache.org/jira/browse/HIVE-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349473#comment-16349473 ] Hive QA commented on HIVE-18567: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908804/HIVE-18567.0.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 12967 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.metastore.client.TestAddPartitions.testAddPartitionsNullColTypeInSd[Embedded] (batchId=206) org.apache.hadoop.hive.metastore.client.TestDatabases.testGetAllDatabases[Embedded] (batchId=213) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.hcatalog.listener.TestDbNotificationListener.dropDatabase (batchId=242) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8976/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8976/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8976/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 23 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12908804 - PreCommit-HIVE-Build > ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly > > > Key: HIVE-18567 > URL: https://issues.apache.org/jira/browse/HIVE-18567 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Adam Szita >Assignee: Adam Szita >Priority: Major > Attachments: HIVE-18567.0.patch > > > As per [this HMS API test > case|https://github.com/apache/hive/commit/fa0a8d27d4149cc5cc2dbb49d8eb6b03f46bc279#diff-25c67d898000b53e623a6df9221aad5dR1044] > listing partition names doesn't check tha max param against > MetaStoreConf.LIMIT_PARTITION_REQUEST (as other methods do by > checkLimitNumberOfPartitionsByFilter), and also behaves differently on max=0 > setting compared to other methods. > We should bring this into consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18596) Synchronize value of hive.spark.client.connect.timeout across unit tests
[ https://issues.apache.org/jira/browse/HIVE-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349442#comment-16349442 ] Sahil Takiar commented on HIVE-18596: - [~pvary] could you take a look? > Synchronize value of hive.spark.client.connect.timeout across unit tests > > > Key: HIVE-18596 > URL: https://issues.apache.org/jira/browse/HIVE-18596 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18596.1.patch > > > {{hive.spark.client.connect.timeout}} is set to 30 seconds for > {{TestMiniSparkOnYarnCliDriver}} but it left at the default value for all > other tests. We should use the same value (30 seconds) for all other tests. > We have seen flaky tests due to failure to establish the remote within the > allotted timeout. This could be due to the fact that we run our tests in the > cloud so maybe occasional network delays are more common. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data
[ https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349390#comment-16349390 ] Steve Yeom commented on HIVE-18599: --- Hi [~gopalv], Can you take a look at patch 3? The test failures by patch 1 are related to null value, which is fixed at patch 2. patch 3 has q file test. > CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write > data > - > > Key: HIVE-18599 > URL: https://issues.apache.org/jira/browse/HIVE-18599 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Steve Yeom >Assignee: Steve Yeom >Priority: Major > Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch, > HIVE-18599.03.patch > > > CTTAS on temporary micromanaged table does not write data. > I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below > script: > > set hive.mapred.mode=nonstrict; > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set tez.grouping.min-size=1; > set tez.grouping.max-size=2; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > > drop table intermediate; > create table intermediate(key int) partitioned by (p int) stored as orc; > insert into table intermediate partition(p='455') select distinct key from > src where key >= 0 order by key desc limit 2; > insert into table intermediate partition(p='456') select distinct key from > src where key is not null order by key asc limit 2; > insert into table intermediate partition(p='457') select distinct key from > src where key >= 100 order by key asc limit 2; > > drop table ctas0_mm; > explain create temporary table ctas0_mm tblproperties > ("transactional"="true", "transactional_properties"="insert_only") as select > * from intermediate; > create temporary table ctas0_mm tblproperties ("transactional"="true", > "transactional_properties"="insert_only") as select * from intermediate; > > select * from ctas0_mm; > drop table ctas0_mm; > drop table intermediate; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data
[ https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Yeom updated HIVE-18599: -- Attachment: HIVE-18599.03.patch > CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write > data > - > > Key: HIVE-18599 > URL: https://issues.apache.org/jira/browse/HIVE-18599 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Steve Yeom >Assignee: Steve Yeom >Priority: Major > Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch, > HIVE-18599.03.patch > > > CTTAS on temporary micromanaged table does not write data. > I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below > script: > > set hive.mapred.mode=nonstrict; > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set tez.grouping.min-size=1; > set tez.grouping.max-size=2; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > > drop table intermediate; > create table intermediate(key int) partitioned by (p int) stored as orc; > insert into table intermediate partition(p='455') select distinct key from > src where key >= 0 order by key desc limit 2; > insert into table intermediate partition(p='456') select distinct key from > src where key is not null order by key asc limit 2; > insert into table intermediate partition(p='457') select distinct key from > src where key >= 100 order by key asc limit 2; > > drop table ctas0_mm; > explain create temporary table ctas0_mm tblproperties > ("transactional"="true", "transactional_properties"="insert_only") as select > * from intermediate; > create temporary table ctas0_mm tblproperties ("transactional"="true", > "transactional_properties"="insert_only") as select * from intermediate; > > select * from ctas0_mm; > drop table ctas0_mm; > drop table intermediate; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349380#comment-16349380 ] Aihua Xu commented on HIVE-14792: - Thanks [~mithun] That's also what I guess. :) > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Labels: TODOC2.2, TODOC2.4 > Fix For: 3.0.0, 2.4.0, 2.2.1 > > Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch > > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For tables using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349374#comment-16349374 ] Mithun Radhakrishnan commented on HIVE-14792: - Terribly sorry for the delay, [~aihuaxu]. Yes, of course. The default should remain false. I'd changed it to true to see if the tests were affected by it. This patch seems to cause test failures on trunk. I'll sort that out and post an update. Thanks for reviewing, [~aihuaxu]. > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Labels: TODOC2.2, TODOC2.4 > Fix For: 3.0.0, 2.4.0, 2.2.1 > > Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch > > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For tables using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18567) ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly
[ https://issues.apache.org/jira/browse/HIVE-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349365#comment-16349365 ] Hive QA commented on HIVE-18567: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s{color} | {color:red} standalone-metastore: The patch generated 1 new + 761 unchanged - 2 fixed = 762 total (was 763) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8976/yetus/diff-checkstyle-standalone-metastore.txt | | modules | C: standalone-metastore U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8976/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly > > > Key: HIVE-18567 > URL: https://issues.apache.org/jira/browse/HIVE-18567 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Adam Szita >Assignee: Adam Szita >Priority: Major > Attachments: HIVE-18567.0.patch > > > As per [this HMS API test > case|https://github.com/apache/hive/commit/fa0a8d27d4149cc5cc2dbb49d8eb6b03f46bc279#diff-25c67d898000b53e623a6df9221aad5dR1044] > listing partition names doesn't check tha max param against > MetaStoreConf.LIMIT_PARTITION_REQUEST (as other methods do by > checkLimitNumberOfPartitionsByFilter), and also behaves differently on max=0 > setting compared to other methods. > We should bring this into consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore
[ https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349359#comment-16349359 ] Alan Gates commented on HIVE-18588: --- Attached a new version of the patch that removes the commented out profile in the pom and sets the surefire version to 2.20.1. > Add 'checkin' profile that runs slower tests in standalone-metastore > > > Key: HIVE-18588 > URL: https://issues.apache.org/jira/browse/HIVE-18588 > Project: Hive > Issue Type: Test > Components: Standalone Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Major > Attachments: HIVE-18588.2.patch, HIVE-18588.patch > > > Runtime for unit tests in standalone-metastore are now exceeding 25 minutes. > Ideally unit tests should finish within 2-3 minutes so users will run them > frequently. To solve this I propose to carve off many of the slower tests to > run in a new 'checkin' profile. This profile should be run before checkin > and by the ptest infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore
[ https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-18588: -- Attachment: HIVE-18588.2.patch > Add 'checkin' profile that runs slower tests in standalone-metastore > > > Key: HIVE-18588 > URL: https://issues.apache.org/jira/browse/HIVE-18588 > Project: Hive > Issue Type: Test > Components: Standalone Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Major > Attachments: HIVE-18588.2.patch, HIVE-18588.patch > > > Runtime for unit tests in standalone-metastore are now exceeding 25 minutes. > Ideally unit tests should finish within 2-3 minutes so users will run them > frequently. To solve this I propose to carve off many of the slower tests to > run in a new 'checkin' profile. This profile should be run before checkin > and by the ptest infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.
[ https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349351#comment-16349351 ] Aihua Xu commented on HIVE-14792: - [~mithun] Ping. Can we keep the default behavior to false still? Otherwise the change looks good. Basically your patch is to read the properties from SerDe as well in addition to the table level properties, right? > AvroSerde reads the remote schema-file at least once per mapper, per table > reference. > - > > Key: HIVE-14792 > URL: https://issues.apache.org/jira/browse/HIVE-14792 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1, 2.1.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan >Priority: Major > Labels: TODOC2.2, TODOC2.4 > Fix For: 3.0.0, 2.4.0, 2.2.1 > > Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch > > > Avro tables that use "external" schema files stored on HDFS can cause > excessive calls to {{FileSystem::open()}}, especially for queries that spawn > large numbers of mappers. > This is because of the following code in {{AvroSerDe::initialize()}}: > {code:title=AvroSerDe.java|borderStyle=solid} > public void initialize(Configuration configuration, Properties properties) > throws SerDeException { > // ... > if (hasExternalSchema(properties) > || columnNameProperty == null || columnNameProperty.isEmpty() > || columnTypeProperty == null || columnTypeProperty.isEmpty()) { > schema = determineSchemaOrReturnErrorSchema(configuration, properties); > } else { > // Get column names and sort order > columnNames = Arrays.asList(columnNameProperty.split(",")); > columnTypes = > TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty); > schema = getSchemaFromCols(properties, columnNames, columnTypes, > columnCommentProperty); > > properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(), > schema.toString()); > } > // ... > } > {code} > For tables using {{avro.schema.url}}, every time the SerDe is initialized > (i.e. at least once per mapper), the schema file is read remotely. For > queries with thousands of mappers, this leads to a stampede to the handful > (3?) datanodes that host the schema-file. In the best case, this causes > slowdowns. > It would be preferable to distribute the Avro-schema to all mappers as part > of the job-conf. The alternatives aren't exactly appealing: > # One can't rely solely on the {{column.list.types}} stored in the Hive > metastore. (HIVE-14789). > # {{avro.schema.literal}} might not always be usable, because of the > size-limit on table-parameters. The typical size of the Avro-schema file is > between 0.5-3MB, in my limited experience. Bumping the max table-parameter > size isn't a great solution. > If the {{avro.schema.file}} were read during query-planning, and made > available as part of table-properties (but not serialized into the > metastore), the downstream logic will remain largely intact. I have a patch > that does this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349335#comment-16349335 ] Hive QA commented on HIVE-18581: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908768/HIVE-18581.1.patch {color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 12965 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullStorageDescriptorInNew[Embedded] (batchId=206) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[3] (batchId=193) org.apache.hive.hcatalog.pig.TestSequenceFileHCatStorer.testWriteDate3 (batchId=193) org.apache.hive.hcatalog.pig.TestSequenceFileHCatStorer.testWriteSmallint (batchId=193) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8974/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8974/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8974/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12908768 - PreCommit-HIVE-Build > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names
[ https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349267#comment-16349267 ] Hive QA commented on HIVE-18581: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s{color} | {color:red} ql: The patch generated 36 new + 12 unchanged - 0 fixed = 48 total (was 12) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} itests/hive-unit: The patch generated 0 new + 643 unchanged - 2 fixed = 643 total (was 645) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 11 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 17m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh | | git revision | master / 32b8994 | | Default Java | 1.8.0_111 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8974/yetus/diff-checkstyle-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-8974/yetus/patch-asflicense-problems.txt | | modules | C: ql itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8974/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Replication events should use lower case db object names > > > Key: HIVE-18581 > URL: https://issues.apache.org/jira/browse/HIVE-18581 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: anishek >Assignee: anishek >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch > > > events generated by replication should include the database / tables / > partitions / function names in lower case. this will prevent other > applications to explicitly do case insensitive match of objects using names. > in hive all db object names as specified above are explicitly converted to > lower case when comparing between objects of same types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-13981) Operation.toSQLException eats full exception stack
[ https://issues.apache.org/jira/browse/HIVE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-13981: -- Attachment: HIVE-13981.2.patch > Operation.toSQLException eats full exception stack > -- > > Key: HIVE-13981 > URL: https://issues.apache.org/jira/browse/HIVE-13981 > Project: Hive > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Attachments: HIVE-13981.1.patch, HIVE-13981.2.patch > > > Operation.toSQLException eats half of the exception stack and make debug > hard. For example, we saw an exception: > {code} > org.apache.hive.service.cli.HiveSQL Exception : Error while compiling > statement: FAILED : NullPointer Exception null > at org.apache.hive.service.cli.operation.Operation.toSQL Exception > (Operation.java:336) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:113) > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:182) > at org.apache.hive.service.cli.operation.Operation.run(Operation.java:278) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:421) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:408) > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:276) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:505) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:562) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang. NullPointer Exception > {code} > The real stack causing the NPE is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-13981) Operation.toSQLException eats full exception stack
[ https://issues.apache.org/jira/browse/HIVE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349263#comment-16349263 ] Daniel Dai commented on HIVE-13981: --- Try to get in, rerun ptest. > Operation.toSQLException eats full exception stack > -- > > Key: HIVE-13981 > URL: https://issues.apache.org/jira/browse/HIVE-13981 > Project: Hive > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Attachments: HIVE-13981.1.patch, HIVE-13981.2.patch > > > Operation.toSQLException eats half of the exception stack and make debug > hard. For example, we saw an exception: > {code} > org.apache.hive.service.cli.HiveSQL Exception : Error while compiling > statement: FAILED : NullPointer Exception null > at org.apache.hive.service.cli.operation.Operation.toSQL Exception > (Operation.java:336) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:113) > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:182) > at org.apache.hive.service.cli.operation.Operation.run(Operation.java:278) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:421) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:408) > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:276) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:505) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:562) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang. NullPointer Exception > {code} > The real stack causing the NPE is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349250#comment-16349250 ] Deepak Jaiswal commented on HIVE-18350: --- [~jdere] [~gopalv] [~ekoifman] can you please review? > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, > HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, > HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables
[ https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349249#comment-16349249 ] Deepak Jaiswal commented on HIVE-18516: --- [~jdere] [~ekoifman] can you please review? > load data should rename files consistent with insert statements for ACID > Tables > --- > > Key: HIVE-18516 > URL: https://issues.apache.org/jira/browse/HIVE-18516 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, > HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, > HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, > HIVE-18516.8.patch, HIVE-18516.9.patch > > > h1. load data should rename files consistent with insert statements for ACID > Tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349133#comment-16349133 ] Deepak Jaiswal edited comment on HIVE-18350 at 2/1/18 8:48 PM: --- Please ignore the failure of test smb_mapjoin_7. its fix is coming in HIVE-18516. Updated the results for bucket_mapjoin_mismatch1 test was (Author: djaiswal): Please ignore the failure of test smb_mapjoin_7. its fix is coming in HIVE-18516. > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, > HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, > HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements
[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-18350: -- Attachment: HIVE-18350.9.patch > load data should rename files consistent with insert statements > --- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal >Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, > HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, > HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch > > > Insert statements create files of format ending with _0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the > bucket in non-strict mode. Hive assumes that the data belongs to same bucket > in a file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. > For existing tables in customer database, it is recommended to reload > bucketed tables otherwise if customer tries to run SMB join and there is a > bucket for which there is no split, then there is a possibility of getting > incorrect results. However, this is not a regression as it would happen even > without the patch. > With this patch however, and reloading data, the results should be correct. > For non-bucketed tables and external tables, there is no difference in > behavior and reloading data is not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18601) Support Power platform by updating protoc-jar-maven-plugin version
[ https://issues.apache.org/jira/browse/HIVE-18601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349214#comment-16349214 ] Hive QA commented on HIVE-18601: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12908759/HIVE-18601.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 12965 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=171) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=163) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat] (batchId=180) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] (batchId=122) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=221) org.apache.hadoop.hive.metastore.client.TestGetPartitions.testGetPartitionWithAuthInfoNoDbName[Embedded] (batchId=206) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testFailureOnAlteringTransactionalProperties (batchId=290) org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap (batchId=282) org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256) org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188) org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234) org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234) org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8973/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8973/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8973/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 22 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12908759 - PreCommit-HIVE-Build > Support Power platform by updating protoc-jar-maven-plugin version > -- > > Key: HIVE-18601 > URL: https://issues.apache.org/jira/browse/HIVE-18601 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 3.0.0 > Environment: # uname -a > Linux pts00607-vm16 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:05:18 UTC > 2016 ppc64le ppc64le ppc64le GNU/Linux > # # cat /etc/lsb-release > DISTRIB_ID=Ubuntu > DISTRIB_RELEASE=16.04 > DISTRIB_CODENAME=xenial > DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS" >Reporter: Pravin Dsilva >Assignee: Pravin Dsilva >Priority: Major > Attachments: HIVE-18601.patch > > > Below is error is seen while building standalone-metastore project > {code:java} > [[1;34mINFO[m] [1m--- [0;32mprotoc-jar-maven-plugin:3.0.0-a3:run[m > [1m(default)[m @ [36mhive-standalone-metastore[0;1m ---[m > [[1;34mINFO[m] Protoc version: 2.5.0 > [[1;34mINFO[m] Input directories: > [[1;34mINFO[m] > /var/lib/jenkins/workspace/hive/standalone-metastore/src/main/protobuf/org/apache/hadoop/hive/metastore > [[1;34mINFO[m] Output targets: > [[1;34mINFO[m] java: > /var/lib/jenkins/workspace/hive/standalone-metastore/target/generated-sources > (add: none, clean: false) > [[1;34mINFO[m] > /var/lib/jenkins/workspace/hive/standalone-metastore/target/generated-sources > does not exist. Creating... > [[1;34mINFO[m] Processing (java): metastore.proto > protoc-jar: protoc version: 250, detected platform: linux/ppc64
[jira] [Updated] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data
[ https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Yeom updated HIVE-18599: -- Attachment: HIVE-18599.02.patch > CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write > data > - > > Key: HIVE-18599 > URL: https://issues.apache.org/jira/browse/HIVE-18599 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Steve Yeom >Assignee: Steve Yeom >Priority: Major > Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch > > > CTTAS on temporary micromanaged table does not write data. > I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below > script: > > set hive.mapred.mode=nonstrict; > set hive.explain.user=false; > set hive.fetch.task.conversion=none; > set tez.grouping.min-size=1; > set tez.grouping.max-size=2; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > > drop table intermediate; > create table intermediate(key int) partitioned by (p int) stored as orc; > insert into table intermediate partition(p='455') select distinct key from > src where key >= 0 order by key desc limit 2; > insert into table intermediate partition(p='456') select distinct key from > src where key is not null order by key asc limit 2; > insert into table intermediate partition(p='457') select distinct key from > src where key >= 100 order by key asc limit 2; > > drop table ctas0_mm; > explain create temporary table ctas0_mm tblproperties > ("transactional"="true", "transactional_properties"="insert_only") as select > * from intermediate; > create temporary table ctas0_mm tblproperties ("transactional"="true", > "transactional_properties"="insert_only") as select * from intermediate; > > select * from ctas0_mm; > drop table ctas0_mm; > drop table intermediate; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18606: -- Status: Patch Available (was: Open) > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18606: -- Attachment: HIVE-18606.01.patch > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18606.01.patch > > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18528) Stats: In the bitvector codepath, when extrapolating column stats for String type columnStringColumnStatsAggregator uses the min value instead of max
[ https://issues.apache.org/jira/browse/HIVE-18528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-18528: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to master. Thanks [~ashutoshc] > Stats: In the bitvector codepath, when extrapolating column stats for String > type columnStringColumnStatsAggregator uses the min value instead of max > - > > Key: HIVE-18528 > URL: https://issues.apache.org/jira/browse/HIVE-18528 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Major > Attachments: HIVE-18528.1.patch, HIVE-18528.1.patch > > > This line: > [https://github.com/apache/hive/blob/456a65180dcb84f69f26b4c9b9265165ad16dfe4/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java#L181] > Should be: > aggregateData.setAvgColLen(Math.max(aggregateData.getAvgColLen(), > newData.getAvgColLen())); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18606: -- Component/s: Transactions > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Priority: Major > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
[ https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18606: - Assignee: Eugene Koifman > CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask > --- > > Key: HIVE-18606 > URL: https://issues.apache.org/jira/browse/HIVE-18606 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > {noformat} > @Test > public void testCtasEmpty() throws Exception { > MetastoreConf.setBoolVar(hiveConf, > MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true); > runStatementOnDriver("create table myctas stored as ORC as" + > " select a, b from " + Table.NONACIDORCTBL); > List rs = runStatementOnDriver("select ROW__ID, a, b, > INPUT__FILE__NAME" + > " from myctas order by ROW__ID"); > } > {noformat} > {noformat} > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done > cleaning up thread local RawStore > 2018-02-01T19:08:52,813 INFO [HiveServer2-Background-Pool: Thread-463]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive > ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > exec.Task (SessionState.java:printError(1228)) - Failed with exception null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: > ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, > return code 1 from {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive
[ https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratandeep Ratti updated HIVE-18410: --- Attachment: HIVE-18410_2.patch > [Performance][Avro] Reading flat Avro tables is very expensive in Hive > -- > > Key: HIVE-18410 > URL: https://issues.apache.org/jira/browse/HIVE-18410 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.2 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti >Priority: Major > Fix For: 3.0.0, 2.3.2 > > Attachments: HIVE-18410.patch, HIVE-18410_1.patch, > HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, > profiling_without_patch.nps, profiling_without_patch.png > > > There's a performance penalty when reading flat [no nested fields] Avro > tables. When reading the same flat dataset in Pig, it takes half the time. > On profiling, a lot of time is spent in > {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the > time is spent in GenericData.get().resolveUnion(), which calls > GenericData.getSchemaName(Object datum), which does a lot of instanceof > checks. This could be simplified with performance benefits. A approach is > described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)