[jira] [Updated] (HIVE-1478) Non-boolean expression in WHERE clause throws exception
[ https://issues.apache.org/jira/browse/HIVE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-1478: --- Attachment: HIVE-1478.1.patch report this problem more clearly - before the query starts executing > Non-boolean expression in WHERE clause throws exception > --- > > Key: HIVE-1478 > URL: https://issues.apache.org/jira/browse/HIVE-1478 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Zoltan Haindrich >Priority: Minor > Attachments: HIVE-1478.1.patch > > > If the expression in the where clause does not evaluate to a boolean, the job > will fail with the following exception in the task logs: > Query: > SELECT key FROM src WHERE 1; > Exception in mapper: > 2010-07-21 17:00:31,460 FATAL ExecMapper: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"key":"238","value":"val_238"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:180) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Boolean > at > org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400) > ... 5 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-1478) Non-boolean expression in WHERE clause throws exception
[ https://issues.apache.org/jira/browse/HIVE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-1478: --- Status: Patch Available (was: Open) > Non-boolean expression in WHERE clause throws exception > --- > > Key: HIVE-1478 > URL: https://issues.apache.org/jira/browse/HIVE-1478 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Zoltan Haindrich >Priority: Minor > Attachments: HIVE-1478.1.patch > > > If the expression in the where clause does not evaluate to a boolean, the job > will fail with the following exception in the task logs: > Query: > SELECT key FROM src WHERE 1; > Exception in mapper: > 2010-07-21 17:00:31,460 FATAL ExecMapper: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"key":"238","value":"val_238"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:180) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Boolean > at > org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400) > ... 5 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15221) Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception
[ https://issues.apache.org/jira/browse/HIVE-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HIVE-15221: --- Attachment: HIVE-15221.1.patch patch uploaded hi, [~alangates] could you please review it ? > Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception > -- > > Key: HIVE-15221 > URL: https://issues.apache.org/jira/browse/HIVE-15221 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.1.0, 2.0.1 >Reporter: Fei Hui >Assignee: Fei Hui > Attachments: HIVE-15221.1.patch > > > i see in the current master version > percentage = (double) usedMemory / (double) maxHeapSize; > if percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException > in my opinion, running is better than fail. after System.gc, ' if percentage > > maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better > And original checking way has a problem: 1) consuming much memory cause gc > (e.g young gc), then check after adding row and pass. 2) consuming much > memory does not cause gc, then check after adding rows but throw Exception > sometimes 2) occurs, but it contians less rows than 1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15221) Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception
[ https://issues.apache.org/jira/browse/HIVE-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui updated HIVE-15221: --- Description: i see in the current master version percentage = (double) usedMemory / (double) maxHeapSize; if percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException in my opinion, running is better than fail. after System.gc, ' if percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better And original checking way has a problem: 1) consuming much memory cause gc (e.g young gc), then check after adding row and pass. 2) consuming much memory does not cause gc, then check after adding rows but throw Exception sometimes 2) occurs, but it contians less rows than 1). was: i see in the current master version percentage = (double) usedMemory / (double) maxHeapSize; if percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException in my opinion, running is better than fail. after System.gc, ' if percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better And original checking way has a problem: a) consuming much memory cause gc (e.g young gc), then check after adding row and pass. 2) consuming much memory does not cause gc, then check after adding rows but throw Exception sometimes 2) occurs, but it contians less rows than 1). > Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception > -- > > Key: HIVE-15221 > URL: https://issues.apache.org/jira/browse/HIVE-15221 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.1.0, 2.0.1 >Reporter: Fei Hui >Assignee: Fei Hui > > i see in the current master version > percentage = (double) usedMemory / (double) maxHeapSize; > if percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException > in my opinion, running is better than fail. after System.gc, ' if percentage > > maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better > And original checking way has a problem: 1) consuming much memory cause gc > (e.g young gc), then check after adding row and pass. 2) consuming much > memory does not cause gc, then check after adding rows but throw Exception > sometimes 2) occurs, but it contians less rows than 1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure
[ https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669690#comment-15669690 ] Rui Li commented on HIVE-15202: --- [~ekoifman], is there any plan to implement this on Hive side? Or do you mean users have to avoid such concurrent compactions themselves? > Concurrent compactions for the same partition may generate malformed folder > structure > - > > Key: HIVE-15202 > URL: https://issues.apache.org/jira/browse/HIVE-15202 > Project: Hive > Issue Type: Bug >Reporter: Rui Li > > If two compactions run concurrently on a single partition, it may generate > folder structure like this: (nested base dir) > {noformat} > drwxr-xr-x - root supergroup 0 2016-11-14 22:23 > /user/hive/warehouse/test/z=1/base_007/base_007 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_0 > -rw-r--r-- 3 root supergroup611 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_1 > -rw-r--r-- 3 root supergroup614 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_2 > -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_3 > -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_4 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_5 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_6 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_7 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_8 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_9 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure
[ https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669657#comment-15669657 ] Eugene Koifman commented on HIVE-15202: --- The right solution would be not to allow 2 concurrent compactions on the same partition. > Concurrent compactions for the same partition may generate malformed folder > structure > - > > Key: HIVE-15202 > URL: https://issues.apache.org/jira/browse/HIVE-15202 > Project: Hive > Issue Type: Bug >Reporter: Rui Li > > If two compactions run concurrently on a single partition, it may generate > folder structure like this: (nested base dir) > {noformat} > drwxr-xr-x - root supergroup 0 2016-11-14 22:23 > /user/hive/warehouse/test/z=1/base_007/base_007 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_0 > -rw-r--r-- 3 root supergroup611 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_1 > -rw-r--r-- 3 root supergroup614 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_2 > -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_3 > -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_4 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_5 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_6 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_7 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_8 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_9 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15204) Hive-Hbase integration thorws "java.lang.ClassNotFoundException: NULL::character varying" (Postgres)
[ https://issues.apache.org/jira/browse/HIVE-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669627#comment-15669627 ] Anshuman commented on HIVE-15204: - Tested, setting datanucleus.rdbms.initializeColumnInfo as NULL is not resolving the issue. PropConfigured: datanucleus.rdbms.initializeColumnInfo NONE DN default on Postgres Bug Result: FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character varying > Hive-Hbase integration thorws "java.lang.ClassNotFoundException: > NULL::character varying" (Postgres) > > > Key: HIVE-15204 > URL: https://issues.apache.org/jira/browse/HIVE-15204 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 2.1.0 > Environment: apache-hive-2.1.0-bin > hbase-1.1.1 >Reporter: Anshuman > Labels: Postgres > > When doing hive to hbase integration, we have observed that current Apache > Hive 2.x is not able to recognise 'NULL::character varying' (Variant data > type of NULL in prostgres) properly and throws the > java.lang.ClassNotFoundException exception. > Exception: > ERROR ql.Driver: FAILED: RuntimeException java.lang.ClassNotFoundException: > NULL::character varying > java.lang.RuntimeException: java.lang.ClassNotFoundException: NULL::character > varying > > Caused by: java.lang.ClassNotFoundException: NULL::character varying > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > Reason: > org.apache.hadoop.hive.ql.metadata.Table.java > final public Class getInputFormatClass() { > if (inputFormatClass == null) { > try { > String className = tTable.getSd().getInputFormat(); > if (className == null) { /*If the className is one of the postgres > variant of NULL i.e. 'NULL::character varying' control is going to else block > and throwing error.*/ > if (getStorageHandler() == null) { > return null; > } > inputFormatClass = getStorageHandler().getInputFormatClass(); > } else { > inputFormatClass = (Class) > Class.forName(className, true, > Utilities.getSessionSpecifiedClassLoader()); > } > } catch (ClassNotFoundException e) { > throw new RuntimeException(e); > } > } > return inputFormatClass; > } > Steps to reproduce: > Hive 2.x (e.g. apache-hive-2.1.0-bin) and HBase (e.g. hbase-1.1.1) > 1. Install and configure Hive, if it is not already installed. > 2. Install and configure HBase, if it is not already installed. > 3. Configure the hive-site.xml File (as per recommended steps) > 4. Provide necessary jars to Hive (as per recommended steps) > 4. Create table in HBase as shown below - > create 'hivehbase', 'ratings' > put 'hivehbase', 'row1', 'ratings:userid', 'user1' > put 'hivehbase', 'row1', 'ratings:bookid', 'book1' > put 'hivehbase', 'row1', 'ratings:rating', '1' > > put 'hivehbase', 'row2', 'ratings:userid', 'user2' > put 'hivehbase', 'row2', 'ratings:bookid', 'book1' > put 'hivehbase', 'row2', 'ratings:rating', '3' > > put 'hivehbase', 'row3', 'ratings:userid', 'user2' > put 'hivehbase', 'row3', 'ratings:bookid', 'book2' > put 'hivehbase', 'row3', 'ratings:rating', '3' > > put 'hivehbase', 'row4', 'ratings:userid', 'user2' > put 'hivehbase', 'row4', 'ratings:bookid', 'book4' > put 'hivehbase', 'row4', 'ratings:rating', '1' > 5. Create external table as shown below > CREATE EXTERNAL TABLE hbasehive_table > (key string, userid string,bookid string,rating int) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES > ("hbase.columns.mapping" = > ":key,ratings:userid,ratings:bookid,ratings:rating") > TBLPROPERTIES ("hbase.table.name" = "hivehbase"); > 6. select * from hbasehive_table; > FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character > varying -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries
[ https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-15129: - Assignee: Rajesh Balamohan > LLAP : Enhance cache hits for stripe metadata across queries > > > Key: HIVE-15129 > URL: https://issues.apache.org/jira/browse/HIVE-15129 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.2.0 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch, > HIVE-15129.3.patch > > > When multiple queries are run in LLAP, stripe metadata cache misses were > observed even though enough memory was available. > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655. > Even in cases when data was found in cache, it wasn't getting used as > {{globalnc}} changed from query to query. Creating a superset of existing > indexes with {{globalInc}} would be helpful. > This would be lot more beneficial in cloud storage where opening and reading > small of data can be expensive compared to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries
[ https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-15129: - Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. Thanks [~rajesh.balamohan] for the patch! > LLAP : Enhance cache hits for stripe metadata across queries > > > Key: HIVE-15129 > URL: https://issues.apache.org/jira/browse/HIVE-15129 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.2.0 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch, > HIVE-15129.3.patch > > > When multiple queries are run in LLAP, stripe metadata cache misses were > observed even though enough memory was available. > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655. > Even in cases when data was found in cache, it wasn't getting used as > {{globalnc}} changed from query to query. Creating a superset of existing > indexes with {{globalInc}} would be helpful. > This would be lot more beneficial in cloud storage where opening and reading > small of data can be expensive compared to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries
[ https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-15129: - Affects Version/s: 2.2.0 > LLAP : Enhance cache hits for stripe metadata across queries > > > Key: HIVE-15129 > URL: https://issues.apache.org/jira/browse/HIVE-15129 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.2.0 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch, > HIVE-15129.3.patch > > > When multiple queries are run in LLAP, stripe metadata cache misses were > observed even though enough memory was available. > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655. > Even in cases when data was found in cache, it wasn't getting used as > {{globalnc}} changed from query to query. Creating a superset of existing > indexes with {{globalInc}} would be helpful. > This would be lot more beneficial in cloud storage where opening and reading > small of data can be expensive compared to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package
[ https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669588#comment-15669588 ] Hive QA commented on HIVE-15219: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839105/HIVE-15219.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10680 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2146/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2146/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2146/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839105 - PreCommit-HIVE-Build > LLAP: Allow additional slider global parameters to be set while creating the > LLAP package > - > > Key: HIVE-15219 > URL: https://issues.apache.org/jira/browse/HIVE-15219 > Project: Hive > Issue Type: Task > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-15219.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15220) WebHCat test driver not capturing end time of test accurately
[ https://issues.apache.org/jira/browse/HIVE-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-15220: -- Status: Patch Available (was: Open) > WebHCat test driver not capturing end time of test accurately > - > > Key: HIVE-15220 > URL: https://issues.apache.org/jira/browse/HIVE-15220 > Project: Hive > Issue Type: Bug > Components: Tests >Reporter: Deepesh Khandelwal >Assignee: Deepesh Khandelwal >Priority: Trivial > Attachments: HIVE-15220.1.patch > > > Webhcat e2e testsuite prints message while ending test run: > {noformat} > Ending test at 1479264720 > {noformat} > Currently it is not capturing the end time correctly. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15220) WebHCat test driver not capturing end time of test accurately
[ https://issues.apache.org/jira/browse/HIVE-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-15220: -- Attachment: HIVE-15220.1.patch Attaching the patch with the change. [~thejas] [~daijy] can you please review? > WebHCat test driver not capturing end time of test accurately > - > > Key: HIVE-15220 > URL: https://issues.apache.org/jira/browse/HIVE-15220 > Project: Hive > Issue Type: Bug > Components: Tests >Reporter: Deepesh Khandelwal >Assignee: Deepesh Khandelwal >Priority: Trivial > Attachments: HIVE-15220.1.patch > > > Webhcat e2e testsuite prints message while ending test run: > {noformat} > Ending test at 1479264720 > {noformat} > Currently it is not capturing the end time correctly. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15220) WebHCat test driver not capturing end time of test accurately
[ https://issues.apache.org/jira/browse/HIVE-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-15220: -- Summary: WebHCat test driver not capturing end time of test accurately (was: WebHCat test not capturing end time of test accurately) > WebHCat test driver not capturing end time of test accurately > - > > Key: HIVE-15220 > URL: https://issues.apache.org/jira/browse/HIVE-15220 > Project: Hive > Issue Type: Bug > Components: Tests >Reporter: Deepesh Khandelwal >Assignee: Deepesh Khandelwal >Priority: Trivial > > Webhcat e2e testsuite prints message while ending test run: > {noformat} > Ending test at 1479264720 > {noformat} > Currently it is not capturing the end time correctly. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode
[ https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669501#comment-15669501 ] Hive QA commented on HIVE-15218: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839101/HIVE-15218.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 189 failed/errored test(s), 10694 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join0] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join31] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats] (batchId=43) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_14] (batchId=11) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_15] (batchId=11) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] (batchId=41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=43) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] (batchId=56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_5] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_cross_product_check_2] (batchId=18) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer5] (batchId=63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cross_product_check_2] (batchId=79) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[identity_project_remove_skip] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join29] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[runtime_skewjoin_mapjoin_spark] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_onesideskew] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_25] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subq_where_serialization] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_in_having] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_multiinsert] (batchId=74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tez_join_hash] (batchId=47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union22] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=70) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[except_distinct] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_all] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_distinct] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_merge] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_nullscan] (batchId=131) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[tez_union_dynamic_partition] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_10] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cluster] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[column_access_stats] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[constprog_dpp] (batchId=138) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
[jira] [Commented] (HIVE-15204) Hive-Hbase integration thorws "java.lang.ClassNotFoundException: NULL::character varying" (Postgres)
[ https://issues.apache.org/jira/browse/HIVE-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669397#comment-15669397 ] Anshuman commented on HIVE-15204: - This is almost a show-stopper for 2.1.0 hive-hbase users. Can we plan for 2.1.1 ? > Hive-Hbase integration thorws "java.lang.ClassNotFoundException: > NULL::character varying" (Postgres) > > > Key: HIVE-15204 > URL: https://issues.apache.org/jira/browse/HIVE-15204 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 2.1.0 > Environment: apache-hive-2.1.0-bin > hbase-1.1.1 >Reporter: Anshuman > Labels: Postgres > > When doing hive to hbase integration, we have observed that current Apache > Hive 2.x is not able to recognise 'NULL::character varying' (Variant data > type of NULL in prostgres) properly and throws the > java.lang.ClassNotFoundException exception. > Exception: > ERROR ql.Driver: FAILED: RuntimeException java.lang.ClassNotFoundException: > NULL::character varying > java.lang.RuntimeException: java.lang.ClassNotFoundException: NULL::character > varying > > Caused by: java.lang.ClassNotFoundException: NULL::character varying > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > Reason: > org.apache.hadoop.hive.ql.metadata.Table.java > final public Class getInputFormatClass() { > if (inputFormatClass == null) { > try { > String className = tTable.getSd().getInputFormat(); > if (className == null) { /*If the className is one of the postgres > variant of NULL i.e. 'NULL::character varying' control is going to else block > and throwing error.*/ > if (getStorageHandler() == null) { > return null; > } > inputFormatClass = getStorageHandler().getInputFormatClass(); > } else { > inputFormatClass = (Class) > Class.forName(className, true, > Utilities.getSessionSpecifiedClassLoader()); > } > } catch (ClassNotFoundException e) { > throw new RuntimeException(e); > } > } > return inputFormatClass; > } > Steps to reproduce: > Hive 2.x (e.g. apache-hive-2.1.0-bin) and HBase (e.g. hbase-1.1.1) > 1. Install and configure Hive, if it is not already installed. > 2. Install and configure HBase, if it is not already installed. > 3. Configure the hive-site.xml File (as per recommended steps) > 4. Provide necessary jars to Hive (as per recommended steps) > 4. Create table in HBase as shown below - > create 'hivehbase', 'ratings' > put 'hivehbase', 'row1', 'ratings:userid', 'user1' > put 'hivehbase', 'row1', 'ratings:bookid', 'book1' > put 'hivehbase', 'row1', 'ratings:rating', '1' > > put 'hivehbase', 'row2', 'ratings:userid', 'user2' > put 'hivehbase', 'row2', 'ratings:bookid', 'book1' > put 'hivehbase', 'row2', 'ratings:rating', '3' > > put 'hivehbase', 'row3', 'ratings:userid', 'user2' > put 'hivehbase', 'row3', 'ratings:bookid', 'book2' > put 'hivehbase', 'row3', 'ratings:rating', '3' > > put 'hivehbase', 'row4', 'ratings:userid', 'user2' > put 'hivehbase', 'row4', 'ratings:bookid', 'book4' > put 'hivehbase', 'row4', 'ratings:rating', '1' > 5. Create external table as shown below > CREATE EXTERNAL TABLE hbasehive_table > (key string, userid string,bookid string,rating int) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES > ("hbase.columns.mapping" = > ":key,ratings:userid,ratings:bookid,ratings:rating") > TBLPROPERTIES ("hbase.table.name" = "hivehbase"); > 6. select * from hbasehive_table; > FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character > varying -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15217) Add watch mode to llap status tool
[ https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669392#comment-15669392 ] Hive QA commented on HIVE-15217: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839100/HIVE-15217.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10679 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=108) [tez_joins_explain.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,union_remove_6_subq.q,groupby2_map_multi_distinct.q,load_dyn_part9.q,multi_insert_gby2.q,vectorization_11.q,groupby_position.q,avro_compression_enabled_native.q,smb_mapjoin_8.q,join21.q,auto_join16.q] org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] (batchId=89) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2144/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2144/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2144/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839100 - PreCommit-HIVE-Build > Add watch mode to llap status tool > -- > > Key: HIVE-15217 > URL: https://issues.apache.org/jira/browse/HIVE-15217 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Attachments: HIVE-15217.1.patch > > > There is few seconds overhead for launching the llap status command. To avoid > we can add "watch" mode to llap status tool that refreshes the status after > configured interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15197) count and sum query on empty table, returning empty output
[ https://issues.apache.org/jira/browse/HIVE-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vishal.rajan updated HIVE-15197: Description: When the below query is run in hive 1.2.0 it returns 'NULLNULL0' on empty table but when the same query is run on hive 2.1.0, nothing is returned on empty table.(both tables are ORC external tables) hive 1.2.0- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone; MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.79 sec HDFS Read: 7354 HDFS Write: 114 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 790 msec OK NULL NULL0 Time taken: 38.168 seconds, Fetched: 1 row(s) -hive 2.1.0- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator 2016-11-14 19:06:15,421 WARN [Thread-215] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2016-11-14 19:06:19,222 INFO [Thread-215] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1 2016-11-14 19:06:20,000 INFO [Thread-215] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:0 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2016-11-14 19:06:39,405 Stage-1 map = 0%, reduce = 0% Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 28.302 seconds was: When the below query is run in hive 1.2.0 it returns 'NULLNULL0' on empty table but when the same query is run on hive 2.1.0, nothing is returned on empty table. hive 1.2.0(ORC)- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone; MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.79 sec HDFS Read: 7354 HDFS Write: 114 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 790 msec OK NULL NULL0 Time taken: 38.168 seconds, Fetched: 1 row(s) -hive 2.1.0(ORC)- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator 2016-11-14 19:06:15,421 WARN [Thread-215] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2016-11-14 19:06:19,222 INFO [Thread-215] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1 2016-11-14 19:06:20,000 INFO [Thread-215] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:0 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2016-11-14 19:06:39,405 Stage-1 map = 0%, reduce = 0% Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 28.302 seconds > count and sum query on empty table, returning empty output > --- > > Key: HIVE-15197 > URL: https://issues.apache.org/jira/browse/HIVE-15197 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: vishal.rajan > > When the below query is run in hive 1.2.0 it returns 'NULL NULL0' on > empty table but when the same query is run on hive 2.1.0, nothing is returned > on empty table.(both tables are ORC external tables) > hive 1.2.0- > hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) > from test_stage.geo_zone; > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 4.79 sec HDFS Read: 7354 HDFS > Write: 114 SUCCESS > Total MapReduce CPU Time Spent: 4 seconds 790 msec > OK > NULL NULL0 > Time taken: 38.168 seconds, Fetched: 1 row(s) > -hive 2.1.0- >
[jira] [Updated] (HIVE-15197) count and sum query on empty table, returning empty output
[ https://issues.apache.org/jira/browse/HIVE-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vishal.rajan updated HIVE-15197: Description: When the below query is run in hive 1.2.0 it returns 'NULLNULL0' on empty table but when the same query is run on hive 2.1.0, nothing is returned on empty table. hive 1.2.0(ORC)- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone; MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.79 sec HDFS Read: 7354 HDFS Write: 114 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 790 msec OK NULL NULL0 Time taken: 38.168 seconds, Fetched: 1 row(s) -hive 2.1.0(ORC)- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator 2016-11-14 19:06:15,421 WARN [Thread-215] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2016-11-14 19:06:19,222 INFO [Thread-215] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1 2016-11-14 19:06:20,000 INFO [Thread-215] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:0 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2016-11-14 19:06:39,405 Stage-1 map = 0%, reduce = 0% Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 28.302 seconds was: When the below query is run in hive 1.2.0 it returns 'NULLNULL0' on empty table but when the same query is run on hive 2.1.0, nothing is returned on empty table. hive 1.2.0 (avro table)- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone; MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.79 sec HDFS Read: 7354 HDFS Write: 114 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 790 msec OK NULL NULL0 Time taken: 38.168 seconds, Fetched: 1 row(s) -hive 2.1.0(ORC)- hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) from test_stage.geo_zone WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator 2016-11-14 19:06:15,421 WARN [Thread-215] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2016-11-14 19:06:19,222 INFO [Thread-215] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1 2016-11-14 19:06:20,000 INFO [Thread-215] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:0 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2016-11-14 19:06:39,405 Stage-1 map = 0%, reduce = 0% Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 28.302 seconds > count and sum query on empty table, returning empty output > --- > > Key: HIVE-15197 > URL: https://issues.apache.org/jira/browse/HIVE-15197 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: vishal.rajan > > When the below query is run in hive 1.2.0 it returns 'NULL NULL0' on > empty table but when the same query is run on hive 2.1.0, nothing is returned > on empty table. > hive 1.2.0(ORC)- > hive> SELECT sum(destination_pincode),sum(length(source_city)),count(*) > from test_stage.geo_zone; > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 4.79 sec HDFS Read: 7354 HDFS > Write: 114 SUCCESS > Total MapReduce CPU Time Spent: 4 seconds 790 msec > OK > NULL NULL0 > Time taken: 38.168 seconds, Fetched: 1 row(s) > -hive 2.1.0(ORC)- > hive> SELECT
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669316#comment-15669316 ] Pengcheng Xiong commented on HIVE-10901: [~gopalv], I am not sure how many reducers were used in jenkins, but it may be related to what you described last time. > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, > HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure
[ https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669292#comment-15669292 ] Rui Li commented on HIVE-15202: --- Also pinging [~ekoifman]. Seems you're quite knowledgeable about transactions :) > Concurrent compactions for the same partition may generate malformed folder > structure > - > > Key: HIVE-15202 > URL: https://issues.apache.org/jira/browse/HIVE-15202 > Project: Hive > Issue Type: Bug >Reporter: Rui Li > > If two compactions run concurrently on a single partition, it may generate > folder structure like this: (nested base dir) > {noformat} > drwxr-xr-x - root supergroup 0 2016-11-14 22:23 > /user/hive/warehouse/test/z=1/base_007/base_007 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_0 > -rw-r--r-- 3 root supergroup611 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_1 > -rw-r--r-- 3 root supergroup614 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_2 > -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_3 > -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_4 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_5 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_6 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_7 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_8 > -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 > /user/hive/warehouse/test/z=1/base_007/bucket_9 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14982) Remove some reserved keywords in 2.2
[ https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669264#comment-15669264 ] Hive QA commented on HIVE-14982: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839090/HIVE-14982.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10680 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=108) [tez_joins_explain.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,union_remove_6_subq.q,groupby2_map_multi_distinct.q,load_dyn_part9.q,multi_insert_gby2.q,vectorization_11.q,groupby_position.q,avro_compression_enabled_native.q,smb_mapjoin_8.q,join21.q,auto_join16.q] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus (batchId=207) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2143/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2143/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2143/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839090 - PreCommit-HIVE-Build > Remove some reserved keywords in 2.2 > > > Key: HIVE-14982 > URL: https://issues.apache.org/jira/browse/HIVE-14982 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14982.01.patch > > > It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This > conflicts with SQL2011 standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode
[ https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669260#comment-15669260 ] Gopal V commented on HIVE-15218: LGTM +1 - tests pending. The Statistics object doesn't seem to be referred on the execution side at all. The question remains about why the {{removeField(kryo, AbstractOperatorDesc.class, "statistics");}} doesn't remove this field when serializing in the first place. > Kyro Exception on subsequent run of a query in LLAP mode > > > Key: HIVE-15218 > URL: https://issues.apache.org/jira/browse/HIVE-15218 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran > Attachments: HIVE-15218.1.patch > > > Following exception is observed when running TPCDS query19 during concurrency > test > {code} > Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, > diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, > diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error > while running task ( failure ) : > attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57) > at > org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184) > ... 15 more > Caused by: java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469) > at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132) > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55) > ... 18 more > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > Encountered unregistered class ID: 63 > Serialization trace: > statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc) > conf (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at
[jira] [Commented] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package
[ https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669236#comment-15669236 ] Gopal V commented on HIVE-15219: Does the user have to provide properly escaped json or some form of serialized string for this to work? > LLAP: Allow additional slider global parameters to be set while creating the > LLAP package > - > > Key: HIVE-15219 > URL: https://issues.apache.org/jira/browse/HIVE-15219 > Project: Hive > Issue Type: Task > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-15219.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package
[ https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-15219: -- Status: Patch Available (was: Open) > LLAP: Allow additional slider global parameters to be set while creating the > LLAP package > - > > Key: HIVE-15219 > URL: https://issues.apache.org/jira/browse/HIVE-15219 > Project: Hive > Issue Type: Task > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-15219.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package
[ https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-15219: -- Attachment: HIVE-15219.patch [~gopalv] - could you please take a look. The initial requirement was for the UI port to be specified. Think it's better to allow a free form string to set additional parameters, instead of creating a new parameter in the llap cli for every parameter. > LLAP: Allow additional slider global parameters to be set while creating the > LLAP package > - > > Key: HIVE-15219 > URL: https://issues.apache.org/jira/browse/HIVE-15219 > Project: Hive > Issue Type: Task > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-15219.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode
[ https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-15218: - Status: Patch Available (was: Open) > Kyro Exception on subsequent run of a query in LLAP mode > > > Key: HIVE-15218 > URL: https://issues.apache.org/jira/browse/HIVE-15218 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran > Attachments: HIVE-15218.1.patch > > > Following exception is observed when running TPCDS query19 during concurrency > test > {code} > Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, > diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, > diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error > while running task ( failure ) : > attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57) > at > org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184) > ... 15 more > Caused by: java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469) > at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132) > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55) > ... 18 more > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > Encountered unregistered class ID: 63 > Serialization trace: > statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc) > conf (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:215) > at >
[jira] [Commented] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode
[ https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669145#comment-15669145 ] Prasanth Jayachandran commented on HIVE-15218: -- [~gopalv] found that HIVE-8769 changed statistics object from being transient to non-transient. > Kyro Exception on subsequent run of a query in LLAP mode > > > Key: HIVE-15218 > URL: https://issues.apache.org/jira/browse/HIVE-15218 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran > Attachments: HIVE-15218.1.patch > > > Following exception is observed when running TPCDS query19 during concurrency > test > {code} > Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, > diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, > diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error > while running task ( failure ) : > attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57) > at > org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184) > ... 15 more > Caused by: java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469) > at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132) > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55) > ... 18 more > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > Encountered unregistered class ID: 63 > Serialization trace: > statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc) > conf (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708) > at >
[jira] [Updated] (HIVE-15218) Kyro Exception on subsequent run of a query in LLAP mode
[ https://issues.apache.org/jira/browse/HIVE-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-15218: - Attachment: HIVE-15218.1.patch [~gopalv] can you please review this change? > Kyro Exception on subsequent run of a query in LLAP mode > > > Key: HIVE-15218 > URL: https://issues.apache.org/jira/browse/HIVE-15218 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran > Attachments: HIVE-15218.1.patch > > > Following exception is observed when running TPCDS query19 during concurrency > test > {code} > Vertex failed, vertexName=Map 3, vertexId=vertex_1477340478603_0610_9_05, > diagnostics=[Task failed, taskId=task_1477340478603_0610_9_05_06, > diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, info=[Error: Error > while running task ( failure ) : > attempt_1477340478603_0610_9_05_06_1:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57) > at > org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:129) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184) > ... 15 more > Caused by: java.lang.RuntimeException: Failed to load plan: > hdfs://cn105-10.l42scl.hortonworks.com:8020/tmp/hive/ndembla/0559ce24-663e-482a-a0ea-106d220b53be/hi... > at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:469) > at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:305) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor$1.call(MapRecordProcessor.java:132) > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55) > ... 18 more > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > Encountered unregistered class ID: 63 > Serialization trace: > statistics (org.apache.hadoop.hive.ql.plan.TableScanDesc) > conf (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:182) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:215) > at >
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669134#comment-15669134 ] Hive QA commented on HIVE-10901: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839085/HIVE-10901.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10680 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=102) [skewjoinopt3.q,smb_mapjoin_4.q,timestamp_comparison.q,union_remove_10.q,mapreduce2.q,bucketmapjoin_negative.q,udf_in_file.q,auto_join12.q,skewjoin.q,vector_left_outer_join.q,semijoin.q,skewjoinopt9.q,smb_mapjoin_3.q,stats10.q,nullgroup4.q] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=91) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct] (batchId=90) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2142/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839085 - PreCommit-HIVE-Build > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, > HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15217) Add watch mode to llap status tool
[ https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-15217: - Status: Patch Available (was: Open) > Add watch mode to llap status tool > -- > > Key: HIVE-15217 > URL: https://issues.apache.org/jira/browse/HIVE-15217 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Attachments: HIVE-15217.1.patch > > > There is few seconds overhead for launching the llap status command. To avoid > we can add "watch" mode to llap status tool that refreshes the status after > configured interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15217) Add watch mode to llap status tool
[ https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-15217: - Attachment: HIVE-15217.1.patch [~sseth] Can you please review the patch? > Add watch mode to llap status tool > -- > > Key: HIVE-15217 > URL: https://issues.apache.org/jira/browse/HIVE-15217 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Attachments: HIVE-15217.1.patch > > > There is few seconds overhead for launching the llap status command. To avoid > we can add "watch" mode to llap status tool that refreshes the status after > configured interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15216) Files on S3 are deleted one by one in INSERT OVERWRITE queries
[ https://issues.apache.org/jira/browse/HIVE-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-15216: Issue Type: Bug (was: Sub-task) Parent: (was: HIVE-14269) > Files on S3 are deleted one by one in INSERT OVERWRITE queries > -- > > Key: HIVE-15216 > URL: https://issues.apache.org/jira/browse/HIVE-15216 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sahil Takiar > > When running {{INSERT OVERWRITE}} queries the files to overwrite are deleted > one by one. The reason is that, by default, hive.exec.stagingdir is inside > the target table directory. > Ideally Hive would just delete the entire table directory, but it can't do > that since the staging data is also inside the directory. Instead it deletes > each file one-by-one, which is very slow. > There are a few ways to fix this: > 1: Move the staging directory outside the table location. This can be done by > setting hive.exec.stagingdir to a different location when running on S3. It > would be nice if users didn't have to explicitly set this when running on S3 > and things just worked out-of-the-box. My understanding is that > hive.exec.stagingdir was only added to support HDFS encryption zones. Since > S3 doesn't have encryption zones, there should be no problem with using the > value of hive.exec.scratchdir to store all intermediate data instead. > 2: Multi-thread the delete operations > 3: See if the {{S3AFileSystem}} can expose some type of bulk delete op -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15114) Remove extra MoveTask operators
[ https://issues.apache.org/jira/browse/HIVE-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669061#comment-15669061 ] Sahil Takiar edited comment on HIVE-15114 at 11/16/16 2:01 AM: --- [~spena] can we add some sets for other execution engines as well? I haven't tested the 2nd patch, but the first patch doesn't seem to take affect for HoS. was (Author: stakiar): [~spena] can we add some sets for other execution engines as well? I haven't test the 2nd patch, but the first patch doesn't seem to take affect for HoS. > Remove extra MoveTask operators > --- > > Key: HIVE-15114 > URL: https://issues.apache.org/jira/browse/HIVE-15114 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 2.1.0 >Reporter: Sahil Takiar >Assignee: Sergio Peña > Attachments: HIVE-15114.WIP.1.patch, HIVE-15114.WIP.2.patch > > > When running simple insert queries (e.g. {{INSERT INTO TABLE ... VALUES > ...}}) there an extraneous {{MoveTask}s is created. > This is problematic when the scratch directory is on S3 since renames require > copying the entire dataset. > For simple queries (like the one above), there are two MoveTasks. The first > one moves the output data from one file in the scratch directory to another > file in the scratch directory. The second MoveTask moves the data from the > scratch directory to its final table location. > The first MoveTask should not be necessary. The goal of this JIRA it to > remove it. This should help improve performance when running on S3. > It seems that the first Move might be caused by a dependency resolution > problem in the optimizer, where a dependent task doesn't get properly removed > when the task it depends on is filtered by a condition resolver. > A dummy {{MoveTask}} is added in the > {{GenMapRedUtils.createMRWorkForMergingFiles}} method. This method creates a > conditional task which launches a job to merge tasks at the end of the file. > At the end of the conditional job there is a MoveTask. > Even though Hive decides that the conditional merge job is no needed, it > seems the MoveTask is still added to the plan. > Seems this extra {{MoveTask}} may have been added intentionally. Not sure why > yet. The {{ConditionalResolverMergeFiles}} says that one of three tasks will > be returned: move task only, merge task only, merge task followed by a move > task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15114) Remove extra MoveTask operators
[ https://issues.apache.org/jira/browse/HIVE-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669061#comment-15669061 ] Sahil Takiar commented on HIVE-15114: - [~spena] can we add some sets for other execution engines as well? I haven't test the 2nd patch, but the first patch doesn't seem to take affect for HoS. > Remove extra MoveTask operators > --- > > Key: HIVE-15114 > URL: https://issues.apache.org/jira/browse/HIVE-15114 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 2.1.0 >Reporter: Sahil Takiar >Assignee: Sergio Peña > Attachments: HIVE-15114.WIP.1.patch, HIVE-15114.WIP.2.patch > > > When running simple insert queries (e.g. {{INSERT INTO TABLE ... VALUES > ...}}) there an extraneous {{MoveTask}s is created. > This is problematic when the scratch directory is on S3 since renames require > copying the entire dataset. > For simple queries (like the one above), there are two MoveTasks. The first > one moves the output data from one file in the scratch directory to another > file in the scratch directory. The second MoveTask moves the data from the > scratch directory to its final table location. > The first MoveTask should not be necessary. The goal of this JIRA it to > remove it. This should help improve performance when running on S3. > It seems that the first Move might be caused by a dependency resolution > problem in the optimizer, where a dependent task doesn't get properly removed > when the task it depends on is filtered by a condition resolver. > A dummy {{MoveTask}} is added in the > {{GenMapRedUtils.createMRWorkForMergingFiles}} method. This method creates a > conditional task which launches a job to merge tasks at the end of the file. > At the end of the conditional job there is a MoveTask. > Even though Hive decides that the conditional merge job is no needed, it > seems the MoveTask is still added to the plan. > Seems this extra {{MoveTask}} may have been added intentionally. Not sure why > yet. The {{ConditionalResolverMergeFiles}} says that one of three tasks will > be returned: move task only, merge task only, merge task followed by a move > task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-15199: Comment: was deleted (was: [~spena] this actually affects any {{INSERT INTO}} query that needs to insert multiple files into the target table location. Each rename operation will basically overwrite the same file again and again, so all data will be lost except data from the last rename op. Note this seems to be a regression of HIVE-12988, which first checked if the destination file existed before renaming it.) > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-15199: Comment: was deleted (was: Jumped the gun on this. This isn't true, only happens when multiple insert intos are run.) > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669041#comment-15669041 ] Sahil Takiar commented on HIVE-15199: - Jumped the gun on this. This isn't true, only happens when multiple insert intos are run. > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668996#comment-15668996 ] Sergey Shelukhin edited comment on HIVE-14990 at 11/16/16 1:36 AM: --- Updated test list to fix/declare irrelevant before closing this. Only updated the CliDriver list actually, haven't made my way thru it yet {noformat} TestCliDriver: stats_list_bucket show_tablestatus vector_udf2 list_bucket_dml_14 autoColumnStats_9 stats_noscan_2 symlink_text_input_format temp_table_precedence offset_limit_global_optimizer rand_partitionpruner2 materialized_view_authorization_sqlstd,materialized_* merge_dynamic_partition, merge_dynamic_partition* orc_vectorization_ppd parquet_join2 repl_3_exim_metadata sample6 sample_islocalmode_hook smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7 orc_createas1 exim_16_part_external,exim_17_part_managed, TestEncryptedHDFSCliDriver: encryption_ctas encryption_drop_partition encryption_insert_values encryption_join_unencrypted_tbl encryption_load_data_to_encrypted_tables MiniLlapLocal: exchgpartition2lel cbo_rp_lineage2 create_merge_compressed deleteAnalyze delete_where_no_match delete_where_non_partitioned dynpart_sort_optimization escape2 insert1 lineage2 lineage3 orc_llap schema_evol_orc_nonvec_part schema_evol_orc_vec_part schema_evol_text_nonvec_part schema_evol_text_vec_part schema_evol_text_vecrow_part smb_mapjoin_6 tez_dml union_fast_stats update_all_types update_tmp_table update_where_no_match update_where_non_partitioned vector_outer_join1 vector_outer_join4 MiniLlap: load_fs2 orc_ppd_basic external_table_with_space_in_location_path file_with_header_footer import_exported_table schemeAuthority,schemeAuthority2 table_nonprintable Minimr: infer_bucket_sort_map_operators infer_bucket_sort_merge infer_bucket_sort_reducers_power_two root_dir_external_table scriptfile1 TestSymlinkTextInputFormat#testCombine TestJdbcWithLocalClusterSpark, etc. {noformat} was (Author: sershe): Updated test list to fix/declare irrelevant before closing this {noformat} TestCliDriver: stats_list_bucket show_tablestatus vector_udf2 list_bucket_dml_14 autoColumnStats_9 stats_noscan_2 symlink_text_input_format temp_table_precedence offset_limit_global_optimizer rand_partitionpruner2 materialized_view_authorization_sqlstd,materialized_* merge_dynamic_partition, merge_dynamic_partition* orc_vectorization_ppd parquet_join2 repl_3_exim_metadata sample6 sample_islocalmode_hook smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7 orc_createas1 exim_16_part_external,exim_17_part_managed, TestEncryptedHDFSCliDriver: encryption_ctas encryption_drop_partition encryption_insert_values encryption_join_unencrypted_tbl encryption_load_data_to_encrypted_tables MiniLlapLocal: exchgpartition2lel cbo_rp_lineage2 create_merge_compressed deleteAnalyze delete_where_no_match delete_where_non_partitioned dynpart_sort_optimization escape2 insert1 lineage2 lineage3 orc_llap schema_evol_orc_nonvec_part schema_evol_orc_vec_part schema_evol_text_nonvec_part schema_evol_text_vec_part schema_evol_text_vecrow_part smb_mapjoin_6 tez_dml union_fast_stats update_all_types update_tmp_table update_where_no_match update_where_non_partitioned vector_outer_join1 vector_outer_join4 MiniLlap: load_fs2 orc_ppd_basic external_table_with_space_in_location_path file_with_header_footer import_exported_table schemeAuthority,schemeAuthority2 table_nonprintable Minimr: infer_bucket_sort_map_operators infer_bucket_sort_merge infer_bucket_sort_reducers_power_two root_dir_external_table scriptfile1 TestSymlinkTextInputFormat#testCombine TestJdbcWithLocalClusterSpark, etc. {noformat} > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, > HIVE-14990.10.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket
[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668996#comment-15668996 ] Sergey Shelukhin commented on HIVE-14990: - Updated test list to fix/declare irrelevant before closing this {noformat} TestCliDriver: stats_list_bucket show_tablestatus vector_udf2 list_bucket_dml_14 autoColumnStats_9 stats_noscan_2 symlink_text_input_format temp_table_precedence offset_limit_global_optimizer rand_partitionpruner2 materialized_view_authorization_sqlstd,materialized_* merge_dynamic_partition, merge_dynamic_partition* orc_vectorization_ppd parquet_join2 repl_3_exim_metadata sample6 sample_islocalmode_hook smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7 orc_createas1 exim_16_part_external,exim_17_part_managed, TestEncryptedHDFSCliDriver: encryption_ctas encryption_drop_partition encryption_insert_values encryption_join_unencrypted_tbl encryption_load_data_to_encrypted_tables MiniLlapLocal: exchgpartition2lel cbo_rp_lineage2 create_merge_compressed deleteAnalyze delete_where_no_match delete_where_non_partitioned dynpart_sort_optimization escape2 insert1 lineage2 lineage3 orc_llap schema_evol_orc_nonvec_part schema_evol_orc_vec_part schema_evol_text_nonvec_part schema_evol_text_vec_part schema_evol_text_vecrow_part smb_mapjoin_6 tez_dml union_fast_stats update_all_types update_tmp_table update_where_no_match update_where_non_partitioned vector_outer_join1 vector_outer_join4 MiniLlap: load_fs2 orc_ppd_basic external_table_with_space_in_location_path file_with_header_footer import_exported_table schemeAuthority,schemeAuthority2 table_nonprintable Minimr: infer_bucket_sort_map_operators infer_bucket_sort_merge infer_bucket_sort_reducers_power_two root_dir_external_table scriptfile1 TestSymlinkTextInputFormat#testCombine TestJdbcWithLocalClusterSpark, etc. {noformat} > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, > HIVE-14990.10.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668974#comment-15668974 ] Hive QA commented on HIVE-14990: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839083/HIVE-14990.10.patch {color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 682 failed/errored test(s), 10025 tests executed *Failed tests:* {noformat} TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed out) (batchId=217) TestHCatClientNotification - did not produce a TEST-*.xml file (likely timed out) (batchId=217) TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed out) (batchId=217) TestHCatHiveThriftCompatibility - did not produce a TEST-*.xml file (likely timed out) (batchId=217) TestPigHBaseStorageHandler - did not produce a TEST-*.xml file (likely timed out) (batchId=217) TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely timed out) (batchId=217) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_concatenate_indexed_table] (batchId=41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge] (batchId=24) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_2] (batchId=37) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_2_orc] (batchId=67) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_3] (batchId=71) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_stats] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join32] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_11] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13] (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_14] (batchId=11) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_15] (batchId=11) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] (batchId=41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=43) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] (batchId=56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_5] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_output_format] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] (batchId=2) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_1] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_2] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_3] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_4] (batchId=37) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_5] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] (batchId=74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_7] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_8] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin11] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin12] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin13] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin8] (batchId=11) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative3] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_1] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_3] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5] (batchId=52)
[jira] [Updated] (HIVE-14982) Remove some reserved keywords in 2.2
[ https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14982: --- Status: Patch Available (was: Open) > Remove some reserved keywords in 2.2 > > > Key: HIVE-14982 > URL: https://issues.apache.org/jira/browse/HIVE-14982 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14982.01.patch > > > It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This > conflicts with SQL2011 standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14982) Remove some reserved keywords in 2.2
[ https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668971#comment-15668971 ] Pengcheng Xiong commented on HIVE-14982: [~ashutoshc] or [~sershe], could u take a quick look? Thanks. > Remove some reserved keywords in 2.2 > > > Key: HIVE-14982 > URL: https://issues.apache.org/jira/browse/HIVE-14982 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14982.01.patch > > > It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This > conflicts with SQL2011 standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14982) Remove some reserved keywords in 2.2
[ https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14982: --- Attachment: HIVE-14982.01.patch > Remove some reserved keywords in 2.2 > > > Key: HIVE-14982 > URL: https://issues.apache.org/jira/browse/HIVE-14982 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14982.01.patch > > > It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This > conflicts with SQL2011 standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10901: --- Status: Patch Available (was: Open) > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, > HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10901: --- Status: Open (was: Patch Available) > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, > HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10901: --- Attachment: HIVE-10901.03.patch > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, > HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14990: Attachment: HIVE-14990.10.patch > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, > HIVE-14990.10.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668755#comment-15668755 ] Sahil Takiar edited comment on HIVE-15199 at 11/15/16 11:39 PM: [~spena] this actually affects any {{INSERT INTO}} query that needs to insert multiple files into the target table location. Each rename operation will basically overwrite the same file again and again, so all data will be lost except data from the last rename op. Note this seems to be a regression of HIVE-12988, which first checked if the destination file existed before renaming it. was (Author: stakiar): [~spena] this actually affects any {{INSERT INTO}} query that needs to insert multiple files into the target table location will lose data. Each rename operation will basically overwrite the same file again and again. Note this seems to be a regression of HIVE-12988, which first checked if the destination file existed before renaming it. > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668755#comment-15668755 ] Sahil Takiar commented on HIVE-15199: - [~spena] this is actually much worse that I thought. Any {{INSERT INTO}} query that needs to insert multiple files into the target table location will lose data, each rename operation will basically overwrite the same file again and again. Note this seems to be a regression of HIVE-12988, which first checked if the destination file existed before renaming it. > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668755#comment-15668755 ] Sahil Takiar edited comment on HIVE-15199 at 11/15/16 11:38 PM: [~spena] this actually affects any {{INSERT INTO}} query that needs to insert multiple files into the target table location will lose data. Each rename operation will basically overwrite the same file again and again. Note this seems to be a regression of HIVE-12988, which first checked if the destination file existed before renaming it. was (Author: stakiar): [~spena] this is actually much worse that I thought. Any {{INSERT INTO}} query that needs to insert multiple files into the target table location will lose data, each rename operation will basically overwrite the same file again and again. Note this seems to be a regression of HIVE-12988, which first checked if the destination file existed before renaming it. > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13557) Make interval keyword optional while specifying DAY in interval arithmetic
[ https://issues.apache.org/jira/browse/HIVE-13557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-13557: Attachment: HIVE-13557.3.patch patch#3 - ( #2 on reviewboard) qtest updates - other minor improvements...like fixing reported argument in exceptions valid intervals: {{1 day}}, {{(1+x) day}}, {{'1' year'}}, {{('1') year}} optionally with interval keyword > Make interval keyword optional while specifying DAY in interval arithmetic > -- > > Key: HIVE-13557 > URL: https://issues.apache.org/jira/browse/HIVE-13557 > Project: Hive > Issue Type: Sub-task > Components: Types >Reporter: Ashutosh Chauhan >Assignee: Zoltan Haindrich > Attachments: HIVE-13557.1.patch, HIVE-13557.1.patch, > HIVE-13557.1.patch, HIVE-13557.2.patch, HIVE-13557.3.patch > > > Currently we support expressions like: {code} > WHERE SOLD_DATE BETWEEN ((DATE('2000-01-31')) - INTERVAL '30' DAY) AND > DATE('2000-01-31') > {code} > We should support: > {code} > WHERE SOLD_DATE BETWEEN ((DATE('2000-01-31')) + (-30) DAY) AND > DATE('2000-01-31') > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15215) Files on S3 are deleted one by one in INSERT OVERWRITE queries
[ https://issues.apache.org/jira/browse/HIVE-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668735#comment-15668735 ] Sahil Takiar commented on HIVE-15215: - Here is the code that trigger the file by file delete (inside the {{Hive.java}} class): {code} replaceFiles(...) { ... FileSystem fs2 = oldPath.getFileSystem(conf); if (fs2.exists(oldPath)) { // Do not delete oldPath if: // - destf is subdir of oldPath //if ( !(fs2.equals(destf.getFileSystem(conf)) && FileUtils.isSubDir(oldPath, destf, fs2))) isOldPathUnderDestf = FileUtils.isSubDir(oldPath, destf, fs2); if (isOldPathUnderDestf) { // if oldPath is destf or its subdir, its should definitely be deleted, otherwise its // existing content might result in incorrect (extra) data. // But not sure why we changed not to delete the oldPath in HIVE-8750 if it is // not the destf or its subdir? oldPathDeleted = FileUtils.trashFilesUnderDir(fs2, oldPath, conf); } } ... } {code} > Files on S3 are deleted one by one in INSERT OVERWRITE queries > -- > > Key: HIVE-15215 > URL: https://issues.apache.org/jira/browse/HIVE-15215 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar > > When running {{INSERT OVERWRITE}} queries the files to overwrite are deleted > one by one. The reason is that, by default, hive.exec.stagingdir is inside > the target table directory. > Ideally Hive would just delete the entire table directory, but it can't do > that since the staging data is also inside the directory. Instead it deletes > each file one-by-one, which is very slow. > There are a few ways to fix this: > 1: Move the staging directory outside the table location. This can be done by > setting hive.exec.stagingdir to a different location when running on S3. It > would be nice if users didn't have to explicitly set this when running on S3 > and things just worked out-of-the-box. My understanding is that > hive.exec.stagingdir was only added to support HDFS encryption zones. Since > S3 doesn't have encryption zones, there should be no problem with using the > value of hive.exec.scratchdir to store all intermediate data instead. > 2: Multi-thread the delete operations > 3: See if the {{S3AFileSystem}} can expose some type of bulk delete op -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-15216) Files on S3 are deleted one by one in INSERT OVERWRITE queries
[ https://issues.apache.org/jira/browse/HIVE-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved HIVE-15216. - Resolution: Duplicate > Files on S3 are deleted one by one in INSERT OVERWRITE queries > -- > > Key: HIVE-15216 > URL: https://issues.apache.org/jira/browse/HIVE-15216 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar > > When running {{INSERT OVERWRITE}} queries the files to overwrite are deleted > one by one. The reason is that, by default, hive.exec.stagingdir is inside > the target table directory. > Ideally Hive would just delete the entire table directory, but it can't do > that since the staging data is also inside the directory. Instead it deletes > each file one-by-one, which is very slow. > There are a few ways to fix this: > 1: Move the staging directory outside the table location. This can be done by > setting hive.exec.stagingdir to a different location when running on S3. It > would be nice if users didn't have to explicitly set this when running on S3 > and things just worked out-of-the-box. My understanding is that > hive.exec.stagingdir was only added to support HDFS encryption zones. Since > S3 doesn't have encryption zones, there should be no problem with using the > value of hive.exec.scratchdir to store all intermediate data instead. > 2: Multi-thread the delete operations > 3: See if the {{S3AFileSystem}} can expose some type of bulk delete op -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-15214) LLAP: Offer a "slow" mode to debug race conditions in package builder
[ https://issues.apache.org/jira/browse/HIVE-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V resolved HIVE-15214. Resolution: Duplicate > LLAP: Offer a "slow" mode to debug race conditions in package builder > -- > > Key: HIVE-15214 > URL: https://issues.apache.org/jira/browse/HIVE-15214 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > > HIVE-15125 is enabled by default, add an option to disable parallel > generation of data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins
[ https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668674#comment-15668674 ] Hive QA commented on HIVE-15211: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839043/HIVE-15211.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10696 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2131/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2131/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2131/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839043 - PreCommit-HIVE-Build > Provide support for complex expressions in ON clauses for INNER joins > - > > Key: HIVE-15211 > URL: https://issues.apache.org/jira/browse/HIVE-15211 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15211.patch > > > Currently, we have some restrictions on the predicates that we can use in ON > clauses for inner joins (we have those restrictions for outer joins too, but > we will tackle that in a follow-up). Semantically equivalent queries can be > expressed if the predicate is introduced in the WHERE clause, but we would > like that user can express it both in ON and WHERE clause, as in standard SQL. > This patch is an extension to overcome these restrictions for inner joins. > It will allow to write queries that currently fail in Hive such as: > {code:sql} > -- Disjunctions > SELECT * > FROM src1 JOIN src > ON (src1.key=src.key > OR src1.value between 100 and 102 > OR src.value between 100 and 102) > LIMIT 10; > -- Conjunction with multiple inputs references in one side > SELECT * > FROM src1 JOIN src > ON (src1.key+src.key >= 100 > AND src1.key+src.key <= 102) > LIMIT 10; > -- Conjunct with no references > SELECT * > FROM src1 JOIN src > ON (src1.value between 100 and 102 > AND src.value between 100 and 102 > AND true) > LIMIT 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15125) LLAP: Parallelize slider package generator
[ https://issues.apache.org/jira/browse/HIVE-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-15125: --- Resolution: Fixed Fix Version/s: 2.2.0 Release Note: LLAP: Parallelize slider package generator (Gopal V, reviewed by Sergey Shelukhin) Status: Resolved (was: Patch Available) > LLAP: Parallelize slider package generator > -- > > Key: HIVE-15125 > URL: https://issues.apache.org/jira/browse/HIVE-15125 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.1.0, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Fix For: 2.2.0 > > Attachments: HIVE-15125.1.patch, HIVE-15125.1.patch > > > The metastore init + download of functions takes approx 4 seconds. > This is enough time to complete all the other operations in parallel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14089) complex type support in LLAP IO is broken
[ https://issues.apache.org/jira/browse/HIVE-14089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14089: Attachment: HIVE-14089.13.patch Timeouts looks spurious, some of my other jiras also had timeouts. Trying again with the same patch > complex type support in LLAP IO is broken > -- > > Key: HIVE-14089 > URL: https://issues.apache.org/jira/browse/HIVE-14089 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-14089.04.patch, HIVE-14089.05.patch, > HIVE-14089.06.patch, HIVE-14089.07.patch, HIVE-14089.08.patch, > HIVE-14089.09.patch, HIVE-14089.10.patch, HIVE-14089.10.patch, > HIVE-14089.10.patch, HIVE-14089.11.patch, HIVE-14089.12.patch, > HIVE-14089.13.patch, HIVE-14089.WIP.2.patch, HIVE-14089.WIP.3.patch, > HIVE-14089.WIP.patch > > > HIVE-13617 is causing MiniLlapCliDriver following test failures > {code} > org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all > org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)
[ https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15057: Attachment: HIVE-15057.wip.patch > Support other types of operators (other than SELECT) > > > Key: HIVE-15057 > URL: https://issues.apache.org/jira/browse/HIVE-15057 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Physical Optimizer >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-15057.wip.patch > > > Currently only SELECT operators are supported for nested column pruning. We > should add support for other types of operators so the optimization can work > for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)
[ https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15057: Attachment: (was: HIVE-15057.wip.patch) > Support other types of operators (other than SELECT) > > > Key: HIVE-15057 > URL: https://issues.apache.org/jira/browse/HIVE-15057 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Physical Optimizer >Reporter: Chao Sun >Assignee: Chao Sun > > Currently only SELECT operators are supported for nested column pruning. We > should add support for other types of operators so the optimization can work > for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15148) disallow loading data into bucketed tables (by default)
[ https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-15148: Attachment: HIVE-15148.03.patch Retrying again... Spark failures look spurious > disallow loading data into bucketed tables (by default) > --- > > Key: HIVE-15148 > URL: https://issues.apache.org/jira/browse/HIVE-15148 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15148.01.patch, HIVE-15148.02.patch, > HIVE-15148.03.patch, HIVE-15148.patch > > > A few q file tests still use the following, allowed, pattern: > {noformat} > CREATE TABLE bucket_small (key string, value string) partitioned by (ds > string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; > load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO > TABLE bucket_small partition(ds='2008-04-08'); > load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO > TABLE bucket_small partition(ds='2008-04-08'); > {noformat} > This relies on the user to load the correct number of files with correctly > hashed data and the correct order of file names; if there's some discrepancy > in any of the above, the queries will fail or may produce incorrect results > if some bucket-based optimizations kick in. > Additionally, even if the user does everything correctly, as far as I know > some code derives bucket number from file name, which won't work in this case > (as opposed to getting buckets based on the order of files, which will work > here but won't work as per HIVE-14970... sigh). > Hive enforces bucketing in other cases (the check cannot even be disabled > these days), so I suggest that we either prohibit the above outright, or at > least add a safety config setting that would disallow it by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11072) Add data validation between Hive metastore upgrades tests
[ https://issues.apache.org/jira/browse/HIVE-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668596#comment-15668596 ] Aihua Xu commented on HIVE-11072: - The new patch looks good to me. [~ctang.ma] do you have more comments on it? > Add data validation between Hive metastore upgrades tests > - > > Key: HIVE-11072 > URL: https://issues.apache.org/jira/browse/HIVE-11072 > Project: Hive > Issue Type: New Feature > Components: Tests >Reporter: Sergio Peña >Assignee: Naveen Gangam > Attachments: HIVE-11072.1.patch, HIVE-11072.2.patch, > HIVE-11072.3.patch, HIVE-11072.4.patch, HIVE-11072.5.patch, > HIVE-11072.to-be-committed.patch > > > An existing Hive metastore upgrade test is running on Hive jenkins. However, > these scripts do test only database schema upgrade, not data validation > between upgrades. > We should validate data between metastore version upgrades. Using data > validation, we may ensure that data won't be damaged, or corrupted when > upgrading the Hive metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14990: Attachment: HIVE-14990.10.patch > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, > HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15207) Implement a capability to detect incorrect sequence numbers
[ https://issues.apache.org/jira/browse/HIVE-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-15207: Status: Patch Available (was: Open) Initial patch: added 'validate' option to hiveSchemaTool and added the logic to detect the invalid sequenceNumber. > Implement a capability to detect incorrect sequence numbers > --- > > Key: HIVE-15207 > URL: https://issues.apache.org/jira/browse/HIVE-15207 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15207.1.patch > > > We have seen next sequence number is smaller than the max(id) for certain > tables. Seems it's caused by thread-safe issue in HMS, but still not sure if > it has been fully fixed. Try to detect such issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15207) Implement a capability to detect incorrect sequence numbers
[ https://issues.apache.org/jira/browse/HIVE-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-15207: Attachment: HIVE-15207.1.patch > Implement a capability to detect incorrect sequence numbers > --- > > Key: HIVE-15207 > URL: https://issues.apache.org/jira/browse/HIVE-15207 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15207.1.patch > > > We have seen next sequence number is smaller than the max(id) for certain > tables. Seems it's caused by thread-safe issue in HMS, but still not sure if > it has been fully fixed. Try to detect such issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668582#comment-15668582 ] Sergey Shelukhin commented on HIVE-14990: - Looks like spark failures may be caused by a spurious pom file change > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14990: Attachment: HIVE-14990.10.patch After the merge. > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14990: Attachment: (was: HIVE-14990.09.patch) > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668562#comment-15668562 ] Sergey Shelukhin commented on HIVE-14990: - ditto for TestHive, etc. failures - caused by output difference due to MM id fields that would go away. > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668554#comment-15668554 ] Sergey Shelukhin commented on HIVE-14990: - Relevant q test failures for future branch merge (w/non-MM tables, after MM patch): load_dyn_part1, autoColumnStats_2 and _1, escape2, load_dyn_part2, dynpart_sort_opt_vectorization, orc_createas1, combine3, update_tmp_table, delete_where_non_partitioned, delete_where_no_match, update_where_no_match, update_where_non_partitioned, update_all_types I suspect many ACID failures are due to incomplete ACID type patch. The rest are out file changes that are either correct or will go away after ACID integration > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15057) Support other types of operators (other than SELECT)
[ https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668553#comment-15668553 ] Hive QA commented on HIVE-15057: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839024/HIVE-15057.wip.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 478 failed/errored test(s), 10399 tests executed *Failed tests:* {noformat} TestCBOMaxNumToCNF - did not produce a TEST-*.xml file (likely timed out) (batchId=255) TestCBORuleFiredOnlyOnce - did not produce a TEST-*.xml file (likely timed out) (batchId=255) TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=11) [auto_join18.q,input1_limit.q,load_dyn_part3.q,autoColumnStats_4.q,auto_sortmerge_join_14.q,drop_table.q,bucket_map_join_tez2.q,auto_join33.q,merge4.q,parquet_external_time.q,storage_format_descriptor.q,mapjoin_hook.q,multi_column_in_single.q,schema_evol_orc_nonvec_table.q,cbo_rp_subq_in.q,authorization_view_disable_cbo_4.q,list_bucket_dml_2.q,cbo_rp_semijoin.q,char_2.q,union_remove_14.q,non_ascii_literal2.q,load_part_authsuccess.q,auto_sortmerge_join_15.q,explain_rearrange.q,varchar_union1.q,input21.q,vector_udf2.q,groupby_cube_multi_gby.q,bucketmapjoin8.q,union34.q] TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=14) [push_or.q,skewjoinopt16.q,bucket3.q,acid_join.q,drop_partitions_filter3.q,schema_evol_text_nonvec_table.q,mrr.q,auto_join15.q,orc_ppd_schema_evol_2b.q,having2.q,regex_col.q,udf_tinyint.q,vector_interval_1.q,semijoin5.q,constprog_dpp.q,skewjoinopt13.q,cbo_rp_auto_join0.q,udf_reflect2.q,udf_div.q,auto_sortmerge_join_6.q,vector_groupby4.q,cbo_SortUnionTransposeRule.q,union_remove_24.q,update_where_non_partitioned.q,annotate_stats_part.q,list_bucket_dml_4.q,join22.q,udf_xpath_short.q,merge_join_1.q,join33.q] TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=4) [join_vc.q,varchar_join1.q,join7.q,insert_values_tmp_table.q,json_serde_tsformat.q,tez_union2.q,script_env_var1.q,bucketsortoptimize_insert_8.q,stats16.q,union20.q,inputddl5.q,select_transform_hint.q,parallel_join1.q,compute_stats_string.q,union_remove_7.q,union27.q,optional_outer.q,vector_include_no_sel.q,insert0.q,folder_predicate.q,groupby_cube1.q,groupby7_map_multi_single_reducer.q,join_reorder4.q,vector_interval_arithmetic.q,smb_mapjoin_17.q,groupby7_map.q,input_part10.q,udf_mask_show_first_n.q,union.q,cbo_udf_udaf.q] TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=5) [ptf_general_queries.q,correlationoptimizer9.q,auto_join_reordering_values.q,sample2.q,decimal_join.q,mapjoin_subquery2.q,join43.q,bucket_if_with_path_filter.q,udf_month.q,mapjoin1.q,avro_partitioned_native.q,join25.q,nullformatdir.q,authorization_admin_almighty1.q,udf_avg.q,cte_mat_4.q,groupby3.q,cbo_rp_union.q,udaf_covar_samp.q,exim_03_nonpart_over_compat.q,udf_logged_in_user.q,index_stale.q,union12.q,skewjoinopt2.q,skewjoinopt18.q,colstats_all_nulls.q,bucketsortoptimize_insert_2.q,quote2.q,udf_classloader.q,authorization_owner_actions.q] TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=58) [touch.q,auto_sortmerge_join_13.q,join4.q,join35.q,filter_cond_pushdown2.q,except_distinct.q,vector_left_outer_join2.q,udf_ucase.q,udf_ceil.q,vectorized_ptf.q,exim_25_export_parentpath_has_inaccessible_children.q,udf_array.q,join_filters.q,udf_current_user.q,acid_vectorization.q,join_reorder3.q,auto_join19.q,distinct_windowing_no_cbo.q,vectorization_15.q,union7.q,vectorization_nested_udf.q,database_properties.q,partition_varchar1.q,vector_groupby_3.q,udf_sort_array.q,cte_6.q,vector_mr_diff_schema_alias.q,rcfile_union.q,explain_logical.q,interval_3.q] TestColumnPrunerProcCtx - did not produce a TEST-*.xml file (likely timed out) (batchId=255) TestGenMapRedUtilsUsePartitionColumnsNegative - did not produce a TEST-*.xml file (likely timed out) (batchId=255) TestHiveMetaStoreChecker - did not produce a TEST-*.xml file (likely timed out) (batchId=255) TestNegativePartitionPrunerCompactExpr - did not produce a TEST-*.xml file (likely timed out) (batchId=255) TestPositivePartitionPrunerCompactExpr - did not produce a TEST-*.xml file (likely timed out) (batchId=255) TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=116) [load_dyn_part2.q,smb_mapjoin_7.q,vectorization_5.q,smb_mapjoin_2.q,ppd_join_filter.q,column_access_stats.q,vector_between_in.q,vectorized_string_funcs.q,vectorization_1.q,bucket_map_join_2.q,groupby4_map_skew.q,groupby_ppr_multi_distinct.q,temp_table_join1.q,vectorized_case.q,stats_noscan_1.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=117)
[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses
[ https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15200: --- Status: Patch Available (was: Open) > Support setOp in subQuery with parentheses > -- > > Key: HIVE-15200 > URL: https://issues.apache.org/jira/browse/HIVE-15200 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15200.01.patch > > > {code} > explain select key from ((select key from src) union (select key from > src))subq; > {code} > will throw > {code} > FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' > 'select' in subquery source > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses
[ https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15200: --- Attachment: HIVE-15200.01.patch > Support setOp in subQuery with parentheses > -- > > Key: HIVE-15200 > URL: https://issues.apache.org/jira/browse/HIVE-15200 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15200.01.patch > > > {code} > explain select key from ((select key from src) union (select key from > src))subq; > {code} > will throw > {code} > FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' > 'select' in subquery source > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15194) Hive on Tez - Hive Runtime Error while closing operators
[ https://issues.apache.org/jira/browse/HIVE-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664624#comment-15664624 ] Wei Zheng edited comment on HIVE-15194 at 11/15/16 10:03 PM: - Thanks [~gopalv] for the quick analysis. But I think it is either isHashMapOnDisk() is true or we have a in-memory hashmap {code} /* It may happen that there's not enough memory to instantiate a hashmap for the partition. * In that case, we don't create the hashmap, but pretend the hashmap is directly "spilled". */ public HashPartition(int initialCapacity, float loadFactor, int wbSize, long maxProbeSize, boolean createHashMap, String spillLocalDirs) { if (createHashMap) { // Probe space should be at least equal to the size of our designated wbSize maxProbeSize = Math.max(maxProbeSize, wbSize); hashMap = new BytesBytesMultiHashMap(initialCapacity, loadFactor, wbSize, maxProbeSize); } else { hashMapSpilledOnCreation = true; hashMapOnDisk = true; } this.spillLocalDirs = spillLocalDirs; this.initialCapacity = initialCapacity; this.loadFactor = loadFactor; this.wbSize = wbSize; } {code} [~ssmane3.tech] It will be helpful if you can attach the hive.log. Thanks. was (Author: wzheng): Thanks [~gopalv] for the quick analysis. But I think it is either isHashMapOnDisk() is false or we have a in-memory hashmap {code} /* It may happen that there's not enough memory to instantiate a hashmap for the partition. * In that case, we don't create the hashmap, but pretend the hashmap is directly "spilled". */ public HashPartition(int initialCapacity, float loadFactor, int wbSize, long maxProbeSize, boolean createHashMap, String spillLocalDirs) { if (createHashMap) { // Probe space should be at least equal to the size of our designated wbSize maxProbeSize = Math.max(maxProbeSize, wbSize); hashMap = new BytesBytesMultiHashMap(initialCapacity, loadFactor, wbSize, maxProbeSize); } else { hashMapSpilledOnCreation = true; hashMapOnDisk = true; } this.spillLocalDirs = spillLocalDirs; this.initialCapacity = initialCapacity; this.loadFactor = loadFactor; this.wbSize = wbSize; } {code} [~ssmane3.tech] It will be helpful if you can attach the hive.log. Thanks. > Hive on Tez - Hive Runtime Error while closing operators > > > Key: HIVE-15194 > URL: https://issues.apache.org/jira/browse/HIVE-15194 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 2.1.0 > Environment: Hive 2.1.0 > Tez 0.8.4 > 4 Nodes x CentOS-6 x64 (32GB Memory, 8 CPUs) > Hadoop 2.7.1 >Reporter: Shankar M > > Please help me to solve below issue.. > -- > I am setting below commands in hive CLI: > set hive.execution.engine=tez; > set hive.vectorized.execution.enabled = true; > set hive.vectorized.execution.reduce.enabled = true; > set hive.cbo.enable=true; > set hive.compute.query.using.stats=true; > set hive.stats.fetch.column.stats=true; > set hive.stats.fetch.partition.stats=true; > SET hive.tez.container.size=4096; > SET hive.tez.java.opts=-Xmx3072m; > -- > {code} > hive> CREATE TABLE tmp_parquet_newtable STORED AS PARQUET AS > > select a.* from orc_very_large_table a where a.event = 1 and EXISTS > (SELECT 1 FROM tmp_small_parquet_table b WHERE b.session_id = a.session_id ) ; > Query ID = hadoop_20161114132930_65843cb3-557c-4b42-b662-2901caf5be2d > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1479059955967_0049) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 . containerFAILED384 440 340 > 26 0 > Map 2 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/02 [===>>---] 11% ELAPSED TIME: 43.76 s > > -- > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1479059955967_0049_2_01, > diagnostics=[Task failed,
[jira] [Updated] (HIVE-15180) Extend JSONMessageFactory to store additional information about metadata objects on different table events
[ https://issues.apache.org/jira/browse/HIVE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-15180: Attachment: HIVE-15180.3.patch > Extend JSONMessageFactory to store additional information about metadata > objects on different table events > -- > > Key: HIVE-15180 > URL: https://issues.apache.org/jira/browse/HIVE-15180 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-15180.1.patch, HIVE-15180.2.patch, > HIVE-15180.3.patch > > > We want the {{NOTIFICATION_LOG}} table to capture additional information > about the metadata objects when {{DbNotificationListener}} captures different > events for a table (create/drop/alter). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata
[ https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668294#comment-15668294 ] Sergey Shelukhin commented on HIVE-15205: - Theres' no patch so I cannot tell :) > Create ReplDumpTask/ReplDumpWork for dumping out metadata > - > > Key: HIVE-15205 > URL: https://issues.apache.org/jira/browse/HIVE-15205 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > The current bootstrap code generates dump metadata during semantic analysis > which breaks security and task/work abstraction. It also uses existing > classes (from Export/Import world) for code reuse purpose, but as a result > ends up dealing with a lot if-then-elses. It makes sense to have a cleaner > abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). > Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata
[ https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668286#comment-15668286 ] Vaibhav Gumashta commented on HIVE-15205: - [~sershe] This jira doesn't touch ImportSemanticAnalyzer etc. It is resolving some of the problems you are pointing to in original jira. All the classes here are new (unless you have created ones with similar names). > Create ReplDumpTask/ReplDumpWork for dumping out metadata > - > > Key: HIVE-15205 > URL: https://issues.apache.org/jira/browse/HIVE-15205 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > The current bootstrap code generates dump metadata during semantic analysis > which breaks security and task/work abstraction. It also uses existing > classes (from Export/Import world) for code reuse purpose, but as a result > ends up dealing with a lot if-then-elses. It makes sense to have a cleaner > abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). > Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins
[ https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15211: --- Status: Patch Available (was: In Progress) > Provide support for complex expressions in ON clauses for INNER joins > - > > Key: HIVE-15211 > URL: https://issues.apache.org/jira/browse/HIVE-15211 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15211.patch > > > Currently, we have some restrictions on the predicates that we can use in ON > clauses for inner joins (we have those restrictions for outer joins too, but > we will tackle that in a follow-up). Semantically equivalent queries can be > expressed if the predicate is introduced in the WHERE clause, but we would > like that user can express it both in ON and WHERE clause, as in standard SQL. > This patch is an extension to overcome these restrictions for inner joins. > It will allow to write queries that currently fail in Hive such as: > {code:sql} > -- Disjunctions > SELECT * > FROM src1 JOIN src > ON (src1.key=src.key > OR src1.value between 100 and 102 > OR src.value between 100 and 102) > LIMIT 10; > -- Conjunction with multiple inputs references in one side > SELECT * > FROM src1 JOIN src > ON (src1.key+src.key >= 100 > AND src1.key+src.key <= 102) > LIMIT 10; > -- Conjunct with no references > SELECT * > FROM src1 JOIN src > ON (src1.value between 100 and 102 > AND src.value between 100 and 102 > AND true) > LIMIT 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins
[ https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15211: --- Attachment: HIVE-15211.patch > Provide support for complex expressions in ON clauses for INNER joins > - > > Key: HIVE-15211 > URL: https://issues.apache.org/jira/browse/HIVE-15211 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15211.patch > > > Currently, we have some restrictions on the predicates that we can use in ON > clauses for inner joins (we have those restrictions for outer joins too, but > we will tackle that in a follow-up). Semantically equivalent queries can be > expressed if the predicate is introduced in the WHERE clause, but we would > like that user can express it both in ON and WHERE clause, as in standard SQL. > This patch is an extension to overcome these restrictions for inner joins. > It will allow to write queries that currently fail in Hive such as: > {code:sql} > -- Disjunctions > SELECT * > FROM src1 JOIN src > ON (src1.key=src.key > OR src1.value between 100 and 102 > OR src.value between 100 and 102) > LIMIT 10; > -- Conjunction with multiple inputs references in one side > SELECT * > FROM src1 JOIN src > ON (src1.key+src.key >= 100 > AND src1.key+src.key <= 102) > LIMIT 10; > -- Conjunct with no references > SELECT * > FROM src1 JOIN src > ON (src1.value between 100 and 102 > AND src.value between 100 and 102 > AND true) > LIMIT 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins
[ https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-15211 started by Jesus Camacho Rodriguez. -- > Provide support for complex expressions in ON clauses for INNER joins > - > > Key: HIVE-15211 > URL: https://issues.apache.org/jira/browse/HIVE-15211 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15211.patch > > > Currently, we have some restrictions on the predicates that we can use in ON > clauses for inner joins (we have those restrictions for outer joins too, but > we will tackle that in a follow-up). Semantically equivalent queries can be > expressed if the predicate is introduced in the WHERE clause, but we would > like that user can express it both in ON and WHERE clause, as in standard SQL. > This patch is an extension to overcome these restrictions for inner joins. > It will allow to write queries that currently fail in Hive such as: > {code:sql} > -- Disjunctions > SELECT * > FROM src1 JOIN src > ON (src1.key=src.key > OR src1.value between 100 and 102 > OR src.value between 100 and 102) > LIMIT 10; > -- Conjunction with multiple inputs references in one side > SELECT * > FROM src1 JOIN src > ON (src1.key+src.key >= 100 > AND src1.key+src.key <= 102) > LIMIT 10; > -- Conjunct with no references > SELECT * > FROM src1 JOIN src > ON (src1.value between 100 and 102 > AND src.value between 100 and 102 > AND true) > LIMIT 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15208) Query string should be HTML encoded for Web UI
[ https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668235#comment-15668235 ] Hive QA commented on HIVE-15208: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839019/HIVE-15208.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10679 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=92) [bucketmapjoin4.q,bucket_map_join_spark4.q,union21.q,groupby2_noskew.q,timestamp_2.q,date_join1.q,mergejoins.q,smb_mapjoin_11.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,merge2.q,groupby6_noskew.q,auto_join_without_localtask.q,multi_join_union.q] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=90) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2129/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2129/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2129/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839019 - PreCommit-HIVE-Build > Query string should be HTML encoded for Web UI > -- > > Key: HIVE-15208 > URL: https://issues.apache.org/jira/browse/HIVE-15208 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: HIVE-15208.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15211) Provide support for complex expressions in ON clauses for INNER joins
[ https://issues.apache.org/jira/browse/HIVE-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15211: --- Summary: Provide support for complex expressions in ON clauses for INNER joins (was: Extends support for complex expressions in inner joins ON clauses) > Provide support for complex expressions in ON clauses for INNER joins > - > > Key: HIVE-15211 > URL: https://issues.apache.org/jira/browse/HIVE-15211 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Currently, we have some restrictions on the predicates that we can use in ON > clauses for inner joins (we have those restrictions for outer joins too, but > we will tackle that in a follow-up). Semantically equivalent queries can be > expressed if the predicate is introduced in the WHERE clause, but we would > like that user can express it both in ON and WHERE clause, as in standard SQL. > This patch is an extension to overcome these restrictions for inner joins. > It will allow to write queries that currently fail in Hive such as: > {code:sql} > -- Disjunctions > SELECT * > FROM src1 JOIN src > ON (src1.key=src.key > OR src1.value between 100 and 102 > OR src.value between 100 and 102) > LIMIT 10; > -- Conjunction with multiple inputs references in one side > SELECT * > FROM src1 JOIN src > ON (src1.key+src.key >= 100 > AND src1.key+src.key <= 102) > LIMIT 10; > -- Conjunct with no references > SELECT * > FROM src1 JOIN src > ON (src1.value between 100 and 102 > AND src.value between 100 and 102 > AND true) > LIMIT 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668141#comment-15668141 ] Sergey Shelukhin commented on HIVE-14990: - Hmm, this seems to have excluded the test code that makes all tables MM > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668051#comment-15668051 ] Hive QA commented on HIVE-14990: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839017/HIVE-14990.09.patch {color:green}SUCCESS:{color} +1 due to 55 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 100 failed/errored test(s), 9991 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[combine3] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view_partitioned] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cteViews] (batchId=69) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_04_all_part] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_05_some_part] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_16_part_external] (batchId=53) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_17_part_managed] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_18_part_external] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_19_00_part_external_location] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition2] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition3] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all2] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_conversions] (batchId=67) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_insertonly_acid] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_createas1] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat3] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat] (batchId=16) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_1_drop] (batchId=10) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_tablestatus] (batchId=69) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1] (batchId=131) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=131) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[autoColumnStats_1] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[autoColumnStats_2] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_unionDistinct_2] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_no_match] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_non_partitioned] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] (batchId=136) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_into_with_schema] (batchId=137) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part1] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[load_dyn_part2] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views] (batchId=138) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[unionDistinct_2] (batchId=137) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_types] (batchId=138) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_tmp_table] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_where_no_match] (batchId=138) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_where_non_partitioned] (batchId=138) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver (batchId=156)
[jira] [Commented] (HIVE-15151) Bootstrap support for replv2
[ https://issues.apache.org/jira/browse/HIVE-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668032#comment-15668032 ] Sergey Shelukhin commented on HIVE-15151: - Please see comment in parent JIRA... making sweeping code changes and moves with FIXME comments that intend to make more sweeping changes and moves repeatedly breaks anyone who does any work in parallel. Changes like that should be made with one commit or on the branch. > Bootstrap support for replv2 > > > Key: HIVE-15151 > URL: https://issues.apache.org/jira/browse/HIVE-15151 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15151.2.patch, HIVE-15151.3.patch, > HIVE-15151.3.patch, HIVE-15151.4.patch, HIVE-15151.addendum.patch, > HIVE-15151.patch > > > We need to support the ability to bootstrap an initial state, dumping out > currently existing dbs/tables, etc, so that incremental replication can take > over from that point. To this end, we should implement commands such as REPL > DUMP, REPL LOAD, REPL STATUS, as described over at > https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata
[ https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668018#comment-15668018 ] Sergey Shelukhin commented on HIVE-15205: - Temporary -1 See the comment in parent JIRA "Is it possible to do work in the branch? This causes immense conflicts with hive-14535 branch, and I see tons of comments that purport with FIXMEs and stuff to move code around and refactor this and that. I think this should be done on the branch and merged once " > Create ReplDumpTask/ReplDumpWork for dumping out metadata > - > > Key: HIVE-15205 > URL: https://issues.apache.org/jira/browse/HIVE-15205 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > The current bootstrap code generates dump metadata during semantic analysis > which breaks security and task/work abstraction. It also uses existing > classes (from Export/Import world) for code reuse purpose, but as a result > ends up dealing with a lot if-then-elses. It makes sense to have a cleaner > abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). > Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15205) Create ReplDumpTask/ReplDumpWork for dumping out metadata
[ https://issues.apache.org/jira/browse/HIVE-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668018#comment-15668018 ] Sergey Shelukhin edited comment on HIVE-15205 at 11/15/16 7:27 PM: --- Temporary -1 See the comment in parent JIRA "Is it possible to do work in the branch? This causes immense conflicts with hive-14535 branch, and I see tons of comments that purport with FIXMEs and stuff to move code around and refactor this and that. I think this should be done on the branch and merged once when it's ready" was (Author: sershe): Temporary -1 See the comment in parent JIRA "Is it possible to do work in the branch? This causes immense conflicts with hive-14535 branch, and I see tons of comments that purport with FIXMEs and stuff to move code around and refactor this and that. I think this should be done on the branch and merged once " > Create ReplDumpTask/ReplDumpWork for dumping out metadata > - > > Key: HIVE-15205 > URL: https://issues.apache.org/jira/browse/HIVE-15205 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > The current bootstrap code generates dump metadata during semantic analysis > which breaks security and task/work abstraction. It also uses existing > classes (from Export/Import world) for code reuse purpose, but as a result > ends up dealing with a lot if-then-elses. It makes sense to have a cleaner > abstraction which uses ReplDumpTask and ReplDumpWork (to configure the Task). > Also perhaps worth evaluating ReplLoadTask/ReplLoadWork for load side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14841) Replication - Phase 2
[ https://issues.apache.org/jira/browse/HIVE-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668011#comment-15668011 ] Sergey Shelukhin commented on HIVE-14841: - Is it possible to do work in the branch? This causes immense conflicts with hive-14535 branch, and I see tons of comments that purport with FIXMEs and stuff to move code around and refactor this and that. I think this should be done on the branch and merged once when ready, so that conflicts with parallel changes to the code affected by the moves are minimized. > Replication - Phase 2 > - > > Key: HIVE-14841 > URL: https://issues.apache.org/jira/browse/HIVE-14841 > Project: Hive > Issue Type: New Feature > Components: repl >Affects Versions: 2.1.0 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > > Per email sent out to the dev list, the current implementation of replication > in hive has certain drawbacks, for instance : > * Replication follows a rubberbanding pattern, wherein different tables/ptns > can be in a different/mixed state on the destination, so that unless all > events are caught up on, we do not have an equivalent warehouse. Thus, this > only satisfies DR cases, not load balancing usecases, and the secondary > warehouse is really only seen as a backup, rather than as a live warehouse > that trails the primary. > * The base implementation is a naive implementation, and has several > performance problems, including a large amount of duplication of data for > subsequent events, as mentioned in HIVE-13348, having to copy out entire > partitions/tables when just a delta of files might be sufficient/etc. Also, > using EXPORT/IMPORT allows us a simple implementation, but at the cost of > tons of temporary space, much of which is not actually applied at the > destination. > Thus, to track this, we now create a new branch (repl2) and a uber-jira(this > one) to track experimental development towards improvement of this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)
[ https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15057: Attachment: (was: HIVE-15057.wip.patch) > Support other types of operators (other than SELECT) > > > Key: HIVE-15057 > URL: https://issues.apache.org/jira/browse/HIVE-15057 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Physical Optimizer >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-15057.wip.patch > > > Currently only SELECT operators are supported for nested column pruning. We > should add support for other types of operators so the optimization can work > for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)
[ https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15057: Attachment: HIVE-15057.wip.patch > Support other types of operators (other than SELECT) > > > Key: HIVE-15057 > URL: https://issues.apache.org/jira/browse/HIVE-15057 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Physical Optimizer >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-15057.wip.patch > > > Currently only SELECT operators are supported for nested column pruning. We > should add support for other types of operators so the optimization can work > for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15208) Query string should be HTML encoded for Web UI
[ https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667957#comment-15667957 ] Xuefu Zhang commented on HIVE-15208: +1 > Query string should be HTML encoded for Web UI > -- > > Key: HIVE-15208 > URL: https://issues.apache.org/jira/browse/HIVE-15208 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: HIVE-15208.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667931#comment-15667931 ] Steve Loughran commented on HIVE-15199: --- sounds related to HADOOP-13402 I am not going to express any opinion about what is "the correct" behaviour we should expect from rename, as I don't think anyone knows that. If you look at the [FS Specification|https://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-common/filesystem/filesystem.html] we're pretty explicit that rename is hard, and that there are different behaviours by different filesystems are. I'm not defending S3A here, just noting I'm not 100% sure of what HDFS does itself here, and how that compares to the semantics of posix's rename call (which is different from the unix command line {{mv}} operation). > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15208) Query string should be HTML encoded for Web UI
[ https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-15208: --- Status: Patch Available (was: Open) Submitted a patch that HTML-escapes the query string. > Query string should be HTML encoded for Web UI > -- > > Key: HIVE-15208 > URL: https://issues.apache.org/jira/browse/HIVE-15208 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: HIVE-15208.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15208) Query string should be HTML encoded for Web UI
[ https://issues.apache.org/jira/browse/HIVE-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-15208: --- Attachment: HIVE-15208.1.patch > Query string should be HTML encoded for Web UI > -- > > Key: HIVE-15208 > URL: https://issues.apache.org/jira/browse/HIVE-15208 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Attachments: HIVE-15208.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found
[ https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14990: Attachment: HIVE-14990.09.patch There are massive conflicts with master due to replication patch. Will attach the diff for now before the merge (that basically undoes some master patches) > run all tests for MM tables and fix the issues that are found > - > > Key: HIVE-14990 > URL: https://issues.apache.org/jira/browse/HIVE-14990 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, > HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, > HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, > HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, > HIVE-14990.09.patch, HIVE-14990.09.patch, HIVE-14990.patch > > > Expected failures > 1) All HCat tests (cannot write MM tables via the HCat writer) > 2) Almost all merge tests (alter .. concat is not supported). > 3) Tests that run dfs commands with specific paths (path changes). > 4) Truncate column (not supported). > 5) Describe formatted will have the new table fields in the output (before > merging MM with ACID). > 6) Many tests w/explain extended - diff in partition "base file name" (path > changes). > 7) TestTxnCommands - all the conversion tests, as they check for bucket count > using file lists (path changes). > 8) HBase metastore tests cause methods are not implemented. > 9) Some load and ExIm tests that export a table and then rely on specific > path for load (path changes). > 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due > to how it accounts for buckets -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667862#comment-15667862 ] Sahil Takiar commented on HIVE-15199: - The code block Sergio posted looks like it could have some major inefficiencies, even for HDFS. If my understanding is correct, the code basically tries to rename the data with the suffix {{... + "_copy_" + counter}}, if it fails (because the files already exists), it increments the counter and then tries again. This doesn't sound like a scalable solution, what happens if there are 1000 files under the directory, any insert will require explicitly checking for the existence of files from {{... + "_copy_0"}} to {{... + "_copy_1000"}}. On HDFS, and especially on S3, this doesn't seem to be a very efficient approach (would be good to confirm this behavior). If the logic above is indeed what happens, there could be a few different ways to fix this. 1: Append an UUID to the end of the file name rather than using a counter, since UUID are globally unique there should be no chance of conflict 2: Append the query_id + a synchronized counter ({{private synchronized long counter}}) to the file name > INSERT INTO data on S3 is replacing the old rows with the new ones > -- > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Critical > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15209) Set hive.strict.checks.cartesian.product to false by default
[ https://issues.apache.org/jira/browse/HIVE-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667850#comment-15667850 ] Jesus Camacho Rodriguez commented on HIVE-15209: [~xuefuz], OK, then it does not violate compliance... but I would argue it is better to disable it by default and let the admin decide whether users should be able to execute cartesian products or not. That would give us better tests coverage and the possibility to run randomly a cartesian product if we want to. Btw, cartesian product does not need to be specified explicitly; it might be produced by the Calcite optimizer too e.g. if we can prune with a filter on a constant equality both inputs of a join. > Set hive.strict.checks.cartesian.product to false by default > > > Key: HIVE-15209 > URL: https://issues.apache.org/jira/browse/HIVE-15209 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15209.patch > > > If we aim to make Hive compliant with SQL, w should disable this property by > default, as expressing a cartesian product, though inefficient, is perfectly > valid in SQL. > Further, if we express complex predicates in the ON clause of a SQL query, we > might not be able to push these predicates to the join operator; however, we > should still be able to execute the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15204) Hive-Hbase integration thorws "java.lang.ClassNotFoundException: NULL::character varying" (Postgres)
[ https://issues.apache.org/jira/browse/HIVE-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667842#comment-15667842 ] Sergey Shelukhin commented on HIVE-15204: - Probably dup of HIVE-14322 > Hive-Hbase integration thorws "java.lang.ClassNotFoundException: > NULL::character varying" (Postgres) > > > Key: HIVE-15204 > URL: https://issues.apache.org/jira/browse/HIVE-15204 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 2.1.0 > Environment: apache-hive-2.1.0-bin > hbase-1.1.1 >Reporter: Anshuman > Labels: Postgres > > When doing hive to hbase integration, we have observed that current Apache > Hive 2.x is not able to recognise 'NULL::character varying' (Variant data > type of NULL in prostgres) properly and throws the > java.lang.ClassNotFoundException exception. > Exception: > ERROR ql.Driver: FAILED: RuntimeException java.lang.ClassNotFoundException: > NULL::character varying > java.lang.RuntimeException: java.lang.ClassNotFoundException: NULL::character > varying > > Caused by: java.lang.ClassNotFoundException: NULL::character varying > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > Reason: > org.apache.hadoop.hive.ql.metadata.Table.java > final public Class getInputFormatClass() { > if (inputFormatClass == null) { > try { > String className = tTable.getSd().getInputFormat(); > if (className == null) { /*If the className is one of the postgres > variant of NULL i.e. 'NULL::character varying' control is going to else block > and throwing error.*/ > if (getStorageHandler() == null) { > return null; > } > inputFormatClass = getStorageHandler().getInputFormatClass(); > } else { > inputFormatClass = (Class) > Class.forName(className, true, > Utilities.getSessionSpecifiedClassLoader()); > } > } catch (ClassNotFoundException e) { > throw new RuntimeException(e); > } > } > return inputFormatClass; > } > Steps to reproduce: > Hive 2.x (e.g. apache-hive-2.1.0-bin) and HBase (e.g. hbase-1.1.1) > 1. Install and configure Hive, if it is not already installed. > 2. Install and configure HBase, if it is not already installed. > 3. Configure the hive-site.xml File (as per recommended steps) > 4. Provide necessary jars to Hive (as per recommended steps) > 4. Create table in HBase as shown below - > create 'hivehbase', 'ratings' > put 'hivehbase', 'row1', 'ratings:userid', 'user1' > put 'hivehbase', 'row1', 'ratings:bookid', 'book1' > put 'hivehbase', 'row1', 'ratings:rating', '1' > > put 'hivehbase', 'row2', 'ratings:userid', 'user2' > put 'hivehbase', 'row2', 'ratings:bookid', 'book1' > put 'hivehbase', 'row2', 'ratings:rating', '3' > > put 'hivehbase', 'row3', 'ratings:userid', 'user2' > put 'hivehbase', 'row3', 'ratings:bookid', 'book2' > put 'hivehbase', 'row3', 'ratings:rating', '3' > > put 'hivehbase', 'row4', 'ratings:userid', 'user2' > put 'hivehbase', 'row4', 'ratings:bookid', 'book4' > put 'hivehbase', 'row4', 'ratings:rating', '1' > 5. Create external table as shown below > CREATE EXTERNAL TABLE hbasehive_table > (key string, userid string,bookid string,rating int) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES > ("hbase.columns.mapping" = > ":key,ratings:userid,ratings:bookid,ratings:rating") > TBLPROPERTIES ("hbase.table.name" = "hivehbase"); > 6. select * from hbasehive_table; > FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character > varying -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15209) Set hive.strict.checks.cartesian.product to false by default
[ https://issues.apache.org/jira/browse/HIVE-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667839#comment-15667839 ] Jesus Camacho Rodriguez commented on HIVE-15209: [~sershe], thank you for pointing that out; I did not remember that. Then I agree with [~xuefuz] that a conscious decision should be made so we do not flip this back and forth as he said. Probably it all depends on the direction that we want to give to the project... > Set hive.strict.checks.cartesian.product to false by default > > > Key: HIVE-15209 > URL: https://issues.apache.org/jira/browse/HIVE-15209 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15209.patch > > > If we aim to make Hive compliant with SQL, w should disable this property by > default, as expressing a cartesian product, though inefficient, is perfectly > valid in SQL. > Further, if we express complex predicates in the ON clause of a SQL query, we > might not be able to push these predicates to the join operator; however, we > should still be able to execute the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15209) Set hive.strict.checks.cartesian.product to false by default
[ https://issues.apache.org/jira/browse/HIVE-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667837#comment-15667837 ] Xuefu Zhang commented on HIVE-15209: I don't think erring out and saying that "Cartesian products are disabled for safety reasons" violates compliance. Further I'd argue that being compliant is not our ultimate goal. As to b/c, the default has been true since HIVE-12727, which is released in 2.0. I don't think we should make existing user suffer just to make new users happier. > Set hive.strict.checks.cartesian.product to false by default > > > Key: HIVE-15209 > URL: https://issues.apache.org/jira/browse/HIVE-15209 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15209.patch > > > If we aim to make Hive compliant with SQL, w should disable this property by > default, as expressing a cartesian product, though inefficient, is perfectly > valid in SQL. > Further, if we express complex predicates in the ON clause of a SQL query, we > might not be able to push these predicates to the join operator; however, we > should still be able to execute the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)