[jira] [Commented] (HIVE-11611) A bad performance regression issue with Parquet happens if Hive does not select any columns
[ https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709468#comment-14709468 ] Sergio Peña commented on HIVE-11611: Thanks [~rdblue] [~Ferd] As you told me offline, then we should close this ticket as 'not fix'. We will wait until parquet releases a new version, and then change to that new one. A bad performance regression issue with Parquet happens if Hive does not select any columns --- Key: HIVE-11611 URL: https://issues.apache.org/jira/browse/HIVE-11611 Project: Hive Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Sergio Peña Assignee: Ferdinand Xu Attachments: HIVE-11611.patch A possible performance issue may happen with the below code when using a query like this {{SELECT count(1) FROM parquetTable}}. {code} if (!ColumnProjectionUtils.isReadAllColumns(configuration) !indexColumnsWanted.isEmpty()) { MessageType requestedSchemaByUser = getSchemaByIndex(tableSchema, columnNamesList, indexColumnsWanted); return new ReadContext(requestedSchemaByUser, contextMetadata); } else { return new ReadContext(tableSchema, contextMetadata); } {code} If there are not columns nor indexes selected, then the above code will read the full schema from Parquet even if Hive does not do anything with such values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709482#comment-14709482 ] Owen O'Malley commented on HIVE-11504: -- Ok, I wrote the patch that addresses Parquet's problem without needlessly complicating the SARG API by splitting out the integer or float types. Please see HIVE-11618. Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV
[ https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709525#comment-14709525 ] Ashutosh Chauhan commented on HIVE-11573: - I agree there is no good reason for storing original predicate in FilterDesc any more. PointLookupOptimizer can be pessimistic at a low nDV Key: HIVE-11573 URL: https://issues.apache.org/jira/browse/HIVE-11573 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Labels: TODOC2.0 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, HIVE-11573.3.patch, HIVE-11573.4.patch, HIVE-11573.5.patch The PointLookupOptimizer can turn off some of the optimizations due to its use of tuple IN() clauses. Limit the application of the optimizer for very low nDV cases and extract the sub-clause as a pre-condition during runtime, to trigger the simple column predicate index lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11622) Creating an Avro table with a complex map-typed column leads to incorrect column type.
[ https://issues.apache.org/jira/browse/HIVE-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-11622: -- Assignee: Jimmy Xiang Creating an Avro table with a complex map-typed column leads to incorrect column type. -- Key: HIVE-11622 URL: https://issues.apache.org/jira/browse/HIVE-11622 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 1.1.0 Reporter: Alexander Behm Assignee: Jimmy Xiang Labels: AvroSerde In the following CREATE TABLE the following map-typed column leads to the wrong type. I suspect some problem with inferring the Avro schema from the column definitions, but I am not sure. Reproduction: {code} hive create table t (c mapstring,arrayint) stored as avro; OK Time taken: 0.101 seconds hive desc t; OK c arraymapstring,int from deserializer Time taken: 0.135 seconds, Fetched: 1 row(s) {code} Note how the type shown in DESCRIBE is not the type originally passed in the CREATE TABLE. However, *sometimes* the DESCRIBE shows the correct output. You may also try these steps which produce a similar problem to increase the chance of hitting this issue: {code} hive create table t (c arraymapstring,int) stored as avro; OK Time taken: 0.063 seconds hive desc t; OK c mapstring,arrayint from deserializer Time taken: 0.152 seconds, Fetched: 1 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11620) Fix several qtest output order
[ https://issues.apache.org/jira/browse/HIVE-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709583#comment-14709583 ] Chao Sun commented on HIVE-11620: - +1 Fix several qtest output order -- Key: HIVE-11620 URL: https://issues.apache.org/jira/browse/HIVE-11620 Project: Hive Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11620.1.patch selectDistinctStar.q unionall_unbalancedppd.q vector_cast_constant.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11623) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix the tableAlias for ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11623: --- Summary: CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix the tableAlias for ReduceSink operator (was: CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix the tableAlias for PTF operator) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix the tableAlias for ReduceSink operator Key: HIVE-11623 URL: https://issues.apache.org/jira/browse/HIVE-11623 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11623.01.patch, HIVE-11623.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11605) Incorrect results with bucket map join in tez.
[ https://issues.apache.org/jira/browse/HIVE-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709681#comment-14709681 ] Sergey Shelukhin commented on HIVE-11605: - Nit: can this be replaced with a boolean expression: {noformat} if (strict) { + if (colCount == listBucketCols.size()) { +return true; + } else { +return false; + } +} else { + return true; +} {noformat} Incorrect results with bucket map join in tez. -- Key: HIVE-11605 URL: https://issues.apache.org/jira/browse/HIVE-11605 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.0, 1.2.0, 1.0.1 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Critical Attachments: HIVE-11605.1.patch In some cases, we aggressively try to convert to a bucket map join and this ends up producing incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException
[ https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709691#comment-14709691 ] Sergey Shelukhin commented on HIVE-11544: - In our test data there were also strings such as null and NULL... Not sure how frequent that is on real data LazyInteger should avoid throwing NumberFormatException --- Key: HIVE-11544 URL: https://issues.apache.org/jira/browse/HIVE-11544 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0 Reporter: William Slacum Assignee: Gopal V Priority: Minor Labels: Performance Attachments: HIVE-11544.1.patch {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these conditions: # bytes are null # radix is invalid # length is 0 # the string is '+' or '-' # {{LazyInteger#parse}} throws a {{NumberFormatException}} Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the exception is caught, swallowed, and {{isNull}} is set to {{true}}. This is generally a bad workflow, as exception creation is a performance bottleneck, and potentially repeating for many rows in a query can have a drastic performance consequence. It would be better if this method returned an {{OptionalInteger}}, which would provide similar functionality with a higher throughput rate. I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so I've marked those as affected. Any version in between would also suffer from this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11526) LLAP: implement LLAP UI as a separate service
[ https://issues.apache.org/jira/browse/HIVE-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709726#comment-14709726 ] Sergey Shelukhin commented on HIVE-11526: - I think the app name is the argument to the $for method. So, the monitor can use a different name. The standard jmx, stack and conf pages do not require any resources in that directory LLAP: implement LLAP UI as a separate service - Key: HIVE-11526 URL: https://issues.apache.org/jira/browse/HIVE-11526 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Kai Sasaki Attachments: llap_monitor_design.pdf The specifics are vague at this point. Hadoop metrics can be output, as well as metrics we collect and output in jmx, as well as those we collect per fragment and log right now. This service can do LLAP-specific views, and per-query aggregation. [~gopalv] may have some information on how to reuse existing solutions for part of the work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException
[ https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709699#comment-14709699 ] William Slacum commented on HIVE-11544: --- The common cases still hit a similar code path-- it's just that the checks would throw an NFE. I'd envision that this patch should keep the same best case scenario, but reduce the worst case scenario. LazyInteger should avoid throwing NumberFormatException --- Key: HIVE-11544 URL: https://issues.apache.org/jira/browse/HIVE-11544 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0 Reporter: William Slacum Assignee: Gopal V Priority: Minor Labels: Performance Attachments: HIVE-11544.1.patch {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these conditions: # bytes are null # radix is invalid # length is 0 # the string is '+' or '-' # {{LazyInteger#parse}} throws a {{NumberFormatException}} Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the exception is caught, swallowed, and {{isNull}} is set to {{true}}. This is generally a bad workflow, as exception creation is a performance bottleneck, and potentially repeating for many rows in a query can have a drastic performance consequence. It would be better if this method returned an {{OptionalInteger}}, which would provide similar functionality with a higher throughput rate. I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so I've marked those as affected. Any version in between would also suffer from this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException
[ https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709700#comment-14709700 ] William Slacum commented on HIVE-11544: --- Would that apply to numeric/integer columns? LazyInteger should avoid throwing NumberFormatException --- Key: HIVE-11544 URL: https://issues.apache.org/jira/browse/HIVE-11544 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0 Reporter: William Slacum Assignee: Gopal V Priority: Minor Labels: Performance Attachments: HIVE-11544.1.patch {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these conditions: # bytes are null # radix is invalid # length is 0 # the string is '+' or '-' # {{LazyInteger#parse}} throws a {{NumberFormatException}} Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the exception is caught, swallowed, and {{isNull}} is set to {{true}}. This is generally a bad workflow, as exception creation is a performance bottleneck, and potentially repeating for many rows in a query can have a drastic performance consequence. It would be better if this method returned an {{OptionalInteger}}, which would provide similar functionality with a higher throughput rate. I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so I've marked those as affected. Any version in between would also suffer from this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11469) Update doc for InstanceCache to clearly define the contract on the SeedObject
[ https://issues.apache.org/jira/browse/HIVE-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11469: Summary: Update doc for InstanceCache to clearly define the contract on the SeedObject (was: InstanceCache does not have proper implementation of equals or hashcode) Update doc for InstanceCache to clearly define the contract on the SeedObject - Key: HIVE-11469 URL: https://issues.apache.org/jira/browse/HIVE-11469 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11469.1.patch.txt With HIVE-11288, we started using InstanceCache as a key. However it doesn't seem like the class actually implements the equals or hashcode methods which can potentially lead to inaccurate results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11606) Bucket map joins fail at hash table construction time
[ https://issues.apache.org/jira/browse/HIVE-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709678#comment-14709678 ] Sergey Shelukhin commented on HIVE-11606: - +1 Bucket map joins fail at hash table construction time - Key: HIVE-11606 URL: https://issues.apache.org/jira/browse/HIVE-11606 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.1, 1.2.1 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-11606.1.patch {code} info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709751#comment-14709751 ] Sergey Shelukhin commented on HIVE-11595: - 1-2 done, although it is misleading cause the buffer contains the entire footer structure, incl. metadata and PS, not just OrcProto.Footer. 3-4 sure 5 there's a comment in the class. It could be changed to expand FileMetaInfo but FileMetaInfo is serialized in splits, so it would be confusing because the newly added fields would be missing on the other side (they are only used during split generation) refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709803#comment-14709803 ] Sergey Shelukhin commented on HIVE-11552: - The way code is currently implemented, the method already bails fast {noformat} ListLong fileIds = req.getFileIds(); ByteBuffer[] metadatas = getMS().getFileMetadata(fileIds); GetFileMetadataResult result = new GetFileMetadataResult(); result.setIsSupported(metadatas != null); if (metadatas != null) { [snip] } return result; {noformat} It will be the same path with extra call otherwise. For put and clear, the methods in ObjectStore are just a no-op. implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.01.patch, HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11422) Join a ACID table with non-ACID table fail with MR
[ https://issues.apache.org/jira/browse/HIVE-11422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-11422: -- Attachment: HIVE-11422.1.patch Retest with the latest top of trunk/branch-1, it seems to be fixed. I didn't chase what actually fix it. The only thing left is to add the test case of HIVE-11438 to make sure this will not be broken going forward. Join a ACID table with non-ACID table fail with MR -- Key: HIVE-11422 URL: https://issues.apache.org/jira/browse/HIVE-11422 Project: Hive Issue Type: Bug Components: Query Processor, Transactions Affects Versions: 1.3.0 Reporter: Daniel Dai Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11422.1.patch The following script fail on MR mode: {code} CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) CLUSTERED BY (k1) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES(transactional=true); INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I'); CREATE TABLE orc_table (k1 INT, f1 STRING) CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS STORED AS ORC; INSERT OVERWRITE TABLE orc_table VALUES (1, 'x'); SET hive.execution.engine=mr; SET hive.auto.convert.join=false; SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; SELECT t1.*, t2.* FROM orc_table t1 JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1; {code} Stack: {code} Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:701) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:169) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas(AcidUtils.java:368) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1211) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1129) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249) ... 9 more {code} The script pass in 1.2.0 release however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709836#comment-14709836 ] Hive QA commented on HIVE-11595: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752057/HIVE-11595.02.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5053/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5053/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5053/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5053/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin From https://github.com/apache/hive 92bd50e..f4aac7e branch-1 - origin/branch-1 9d9dd72..5e16d53 hbase-metastore - origin/hbase-metastore a16bbd4..dd2bdfc master - origin/master + git reset --hard HEAD HEAD is now at a16bbd4 HIVE-11176 : Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object; (Navis via Ashutosh Chauhan) + git clean -f -d Removing ql/src/test/queries/clientpositive/pointlookup.q Removing ql/src/test/queries/clientpositive/pointlookup2.q Removing ql/src/test/results/clientpositive/pointlookup.q.out Removing ql/src/test/results/clientpositive/pointlookup2.q.out + git checkout master Already on 'master' Your branch is behind 'origin/master' by 4 commits, and can be fast-forwarded. + git reset --hard origin/master HEAD is now at dd2bdfc HIVE-11469 : Update doc for InstanceCache to clearly define the contract on the SeedObject (Swarnim Kulkarni via Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12752057 - PreCommit-HIVE-TRUNK-Build refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10296) Cast exception observed when hive runs a multi join query on metastore (postgres), since postgres pushes the filter into the join, and ignores the condition before appl
[ https://issues.apache.org/jira/browse/HIVE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709783#comment-14709783 ] Sergey Shelukhin commented on HIVE-10296: - this case was incomplete and was recently improved to include other join filters (see dbHasJoinCastBug there). However, we didn't realize postgres also has this bug, so it needs to be added to the list. With the full case filter it should not try to cast non-numerics unless they are actually stored in the numeric column. Can you check where DEFAULT_BINSRC value is coming from? Cast exception observed when hive runs a multi join query on metastore (postgres), since postgres pushes the filter into the join, and ignores the condition before applying cast - Key: HIVE-10296 URL: https://issues.apache.org/jira/browse/HIVE-10296 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Yash Datta Try to drop a partition from hive: ALTER TABLE f___edr_bin_source___900_sub_id DROP IF EXISTS PARTITION ( exporttimestamp=1427824800, timestamp=1427824800) This triggers a query on the metastore like this : select PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ? inner join DBS on TBLS.DB_ID = DBS.DB_ID and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 inner join PARTITION_KEY_VALS FILTER1 on FILTER1.PART_ID = PARTITIONS.PART_ID and FILTER1.INTEGER_IDX = 1 where ( (((case when TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL as decimal(21,0)) else null end) = ?) and ((case when TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER1.PART_KEY_VAL as decimal(21,0)) else null end) = ?)) ) In some cases, when the internal tables in postgres (metastore) have some amount of data, the query plan pushes the condition down into the join. Now because of DERBY-6358 , case when clause is used before the cast, but in this case , cast is evaluated before condition being evaluated. So in case we have different tables partitioned on string and integer columns, cast exception is observed! 15/04/06 08:41:20 ERROR metastore.ObjectStore: Direct SQL failed, falling back to ORM javax.jdo.JDODataStoreException: Error executing SQL query select PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ? inner join DBS on TBLS.DB_ID = DBS.DB_ID and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 inner join PARTITION_KEY_VALS FILTER1 on FILTER1.PART_ID = PARTITIONS.PART_ID and FILTER1.INTEGER_IDX = 1 where ( (((case when TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL as decimal(21,0)) else null end) = ?) and ((case when TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER1.PART_KEY_VAL as decimal(21,0)) else null end) = ?)) ). at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451) at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1915) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1909) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1909) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1882) org.postgresql.util.PSQLException: ERROR: invalid input syntax for type numeric: __DEFAULT_BINSRC__ 15/04/06 08:41:20 INFO metastore.ObjectStore: JDO filter pushdown cannot be used: Filtering is supported only on partition keys of type string 15/04/06 08:41:20 ERROR metastore.ObjectStore: javax.jdo.JDOException: Exception thrown when executing query at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:275) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesNoTxn(ObjectStore.java:1700) at
[jira] [Resolved] (HIVE-10289) Support filter on non-first partition key and non-string partition key
[ https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved HIVE-10289. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: hbase-metastore-branch Patch committed to hbase-metastore branch. Thanks Alan for review! Support filter on non-first partition key and non-string partition key -- Key: HIVE-10289 URL: https://issues.apache.org/jira/browse/HIVE-10289 Project: Hive Issue Type: Sub-task Components: HBase Metastore, Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Fix For: hbase-metastore-branch Attachments: HIVE-10289.1.patch, HIVE-10289.2.patch, HIVE-10289.3.patch Currently, partition filtering only handles the first partition key and the type for this partition key must be string. In order to break this limitation, several improvements are required: 1. Change serialization format for partition key. Currently partition keys are serialized into delimited string, which sorted on string order not with regard to the actual type of the partition key. We use BinarySortableSerDe for this purpose. 2. For filter condition not on the initial partition keys, push it into HBase RowFilter. RowFilter will deserialize the partition key and evaluate the filter condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11595: Attachment: HIVE-11595.02.patch Updated patch refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9005) HiveSever2 error with Illegal Operation state transition from CLOSED to ERROR
[ https://issues.apache.org/jira/browse/HIVE-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709851#comment-14709851 ] Ashutosh Chauhan commented on HIVE-9005: [~vgumashta] Would you like to review this? HiveSever2 error with Illegal Operation state transition from CLOSED to ERROR --- Key: HIVE-9005 URL: https://issues.apache.org/jira/browse/HIVE-9005 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Binglin Chang Attachments: HIVE-9005.1.patch {noformat} 2014-12-02 11:25:40,855 WARN [HiveServer2-Background-Pool: Thread-17]: ql.Driver (DriverContext.java:shutdown(137)) - Shutting down task : Stage-1:MAPRED 2014-12-02 11:25:41,898 INFO [HiveServer2-Background-Pool: Thread-30]: exec.Task (SessionState.java:printInfo(536)) - Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-12-02 11:25:41,942 WARN [HiveServer2-Background-Pool: Thread-30]: mapreduce.Counters (AbstractCounters.java:getGroup(234)) - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2014-12-02 11:25:41,942 INFO [HiveServer2-Background-Pool: Thread-30]: exec.Task (SessionState.java:printInfo(536)) - 2014-12-02 11:25:41,939 Stage-1 map = 0%, reduce = 0% 2014-12-02 11:25:41,945 WARN [HiveServer2-Background-Pool: Thread-30]: mapreduce.Counters (AbstractCounters.java:getGroup(234)) - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2014-12-02 11:25:41,952 ERROR [HiveServer2-Background-Pool: Thread-30]: exec.Task (SessionState.java:printError(545)) - Ended Job = job_1413717733669_207982 with errors 2014-12-02 11:25:41,954 ERROR [Thread-39]: exec.Task (SessionState.java:printError(545)) - Error during job, obtaining debugging information... 2014-12-02 11:25:41,957 ERROR [HiveServer2-Background-Pool: Thread-30]: ql.Driver (SessionState.java:printError(545)) - FAILED: Operation cancelled 2014-12-02 11:25:41,957 INFO [HiveServer2-Background-Pool: Thread-30]: ql.Driver (SessionState.java:printInfo(536)) - MapReduce Jobs Launched: 2014-12-02 11:25:41,960 WARN [HiveServer2-Background-Pool: Thread-30]: mapreduce.Counters (AbstractCounters.java:getGroup(234)) - Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 2014-12-02 11:25:41,961 INFO [HiveServer2-Background-Pool: Thread-30]: ql.Driver (SessionState.java:printInfo(536)) - Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL 2014-12-02 11:25:41,961 INFO [HiveServer2-Background-Pool: Thread-30]: ql.Driver (SessionState.java:printInfo(536)) - Total MapReduce CPU Time Spent: 0 msec 2014-12-02 11:25:41,965 ERROR [HiveServer2-Background-Pool: Thread-30]: operation.Operation (SQLOperation.java:run(205)) - Error running hive query: org.apache.hive.service.cli.HiveSQLException: Illegal Operation state transition from CLOSED to ERROR at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:91) at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:97) at org.apache.hive.service.cli.operation.Operation.setState(Operation.java:116) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:161) at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:71) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1589) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:504) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:215) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11547) beeline does not continue running the script after an error occurs while beeline --force=true is already set.
[ https://issues.apache.org/jira/browse/HIVE-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709865#comment-14709865 ] Wei Huang commented on HIVE-11547: -- Tried HIVE-11203 patch and it works only in interactive mode. It does not work if we use the –f option to run the script from a file. beeline does not continue running the script after an error occurs while beeline --force=true is already set. --- Key: HIVE-11547 URL: https://issues.apache.org/jira/browse/HIVE-11547 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0 Environment: HDP 2.3 on Virtual box Reporter: Wei Huang If you execute beeline to run a SQL script file, using the following command beeline -f query file name the beeline exists after the first error. i.e. when a test query fails beeline quits to the CLI. The beeline --force=true seems to have a bug and it does not continue running the script after an error occurs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11599) Add metastore command to dump it's configs
[ https://issues.apache.org/jira/browse/HIVE-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709871#comment-14709871 ] Sushanth Sowmyan commented on HIVE-11599: - +1 to intent, this would be most useful. Add metastore command to dump it's configs -- Key: HIVE-11599 URL: https://issues.apache.org/jira/browse/HIVE-11599 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 1.0.0 Reporter: Eugene Koifman We should have equivalent of Hive CLI set command on Metastore (and likely HS2) which can dump out all properties this particular process is running with. cc [~thejas] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709873#comment-14709873 ] Sergey Shelukhin commented on HIVE-11552: - There's an issue with ASF LDAP server somewhere, will commit when I can... implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.01.patch, HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11595: Attachment: HIVE-11595.03.patch rebased the patch refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.
[ https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-11581: Attachment: HIVE-11581.4.patch HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string. --- Key: HIVE-11581 URL: https://issues.apache.org/jira/browse/HIVE-11581 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 1.3.0, 2.0.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-11581.1.patch, HIVE-11581.2.patch, HIVE-11581.3.patch, HIVE-11581.3.patch, HIVE-11581.4.patch Currently, the client needs to specify several parameters based on which an appropriate connection is created with the server. In case of dynamic service discovery, when multiple HS2 instances are running, it is much more usable for the server to add its config parameters to ZK which the driver can use to configure the connection, instead of the jdbc/odbc user adding those in connection string. However, at minimum, client will need to specify zookeeper ensemble and that she wants the JDBC driver to use ZooKeeper: {noformat} beeline !connect jdbc:hive2://vgumashta.local:2181,vgumashta.local:2182,vgumashta.local:2183/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 vgumashta vgumashta org.apache.hive.jdbc.HiveDriver {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11217) CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column
[ https://issues.apache.org/jira/browse/HIVE-11217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708894#comment-14708894 ] Ashutosh Chauhan commented on HIVE-11217: - Patch seems reasonable to me. [~prasanth_j] what do you think ? CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column -- Key: HIVE-11217 URL: https://issues.apache.org/jira/browse/HIVE-11217 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Gaurav Kohli Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-11217.1.patch, HIVE-11217.2.patch If you try to use create-table-as-select (CTAS) statement and create a ORC File format based table, then you can't use NULL as a column value in select clause CREATE TABLE empty (x int); CREATE TABLE orc_table_with_null STORED AS ORC AS SELECT x, null FROM empty; Error: {quote} 347084 [main] ERROR hive.ql.exec.DDLTask - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Unknown primitive type VOID at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:643) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4242) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:285) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39) at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.IllegalArgumentException: Unknown primitive type VOID at org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:530) at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.init(OrcStruct.java:195) at org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:534) at org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:106) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:519) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:345) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:292) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:194) at
[jira] [Updated] (HIVE-11625) Map instances with null keys are not written to Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11625: Issue Type: Sub-task (was: Bug) Parent: HIVE-8120 Map instances with null keys are not written to Parquet tables -- Key: HIVE-11625 URL: https://issues.apache.org/jira/browse/HIVE-11625 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0, 0.13.1, 1.0.1, 1.1.1, 1.2.1 Reporter: Cheng Lian Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are dropped: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} This is because entries with null keys are explicitly skipped in {{DataWritableWriter}}, [see here|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11383: --- Attachment: HIVE-11383.11.patch Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.10.patch, HIVE-11383.11.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11450) Resources are not cleaned up properly at multiple places
[ https://issues.apache.org/jira/browse/HIVE-11450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708905#comment-14708905 ] Hive QA commented on HIVE-11450: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12751973/HIVE-11450.4.patch {color:green}SUCCESS:{color} +1 9377 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5049/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5049/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5049/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12751973 - PreCommit-HIVE-TRUNK-Build Resources are not cleaned up properly at multiple places Key: HIVE-11450 URL: https://issues.apache.org/jira/browse/HIVE-11450 Project: Hive Issue Type: Bug Components: JDBC Reporter: Nezih Yigitbasi Assignee: Nezih Yigitbasi Attachments: HIVE-11450.2.patch, HIVE-11450.3.patch, HIVE-11450.4.patch, HIVE-11450.patch I noticed that various resources aren't properly cleaned up in various classes. To be specific, * Some streams aren't properly cleaned up in {{beeline/src/java/org/apache/hive/beeline/BeeLine.java}} and {{beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java}} * {{Statement}}, {{ResultSet}}, and {{Connection}} aren't properly cleaned up in {{beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java}} * {{Statement}} and {{ResultSet}} aren't properly cleaned up in {{jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11625) Map instances with null keys are not properly handled for Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HIVE-11625: -- Description: Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are either dropped or cause exceptions. Below is the result of Hive 0.14.0 and 0.13.1: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} And Hive 1.2.1 throws exception: {noformat} java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 more Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) ... 9 more Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:244) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60) ... 23 more java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable
[jira] [Commented] (HIVE-11625) Map instances with null keys are not properly handled for Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709076#comment-14709076 ] Cheng Lian commented on HIVE-11625: --- Sorry, according to the following statements in parquet-format spec {quote} The {{key}} field encodes the map's key type. This field must have repetition {{required}} and must always be present. {quote} Map keys written to Parquet must not be null. Then I think the problem here is that, whether should we silently ignore null keys when writing a map to a Parquet table like what Hive 0.14.0 does, or throw an exception (probably a more descriptive one instead of the one mentioned in the ticket description) like Hive 1.2.1. Map instances with null keys are not properly handled for Parquet tables Key: HIVE-11625 URL: https://issues.apache.org/jira/browse/HIVE-11625 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0, 0.13.1, 1.0.1, 1.1.1, 1.2.1 Reporter: Cheng Lian Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are either dropped or cause exceptions. Below is the result of Hive 0.14.0 and 0.13.1: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} And Hive 1.2.1 throws exception: {noformat} java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 more Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) ... 9 more Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:244) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89) at
[jira] [Commented] (HIVE-11625) Map instances with null keys are not properly handled for Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709081#comment-14709081 ] Cheng Lian commented on HIVE-11625: --- Updated ticket description according to my comment above. Map instances with null keys are not properly handled for Parquet tables Key: HIVE-11625 URL: https://issues.apache.org/jira/browse/HIVE-11625 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0, 0.13.1, 1.0.1, 1.1.1, 1.2.1 Reporter: Cheng Lian Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are either dropped or cause exceptions. Below is the result of Hive 0.14.0 and 0.13.1: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} And Hive 1.2.1 throws exception: {noformat} java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 more Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) ... 9 more Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:244) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:228) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60) ... 23 more java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at
[jira] [Commented] (HIVE-11625) Map instances with null keys are not written to Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708885#comment-14708885 ] Cheng Lian commented on HIVE-11625: --- I meant to open this issue as a Parquet bug, but it seems that the Parquet support in Hive code base diverges a lot from parquet-hive. Fixes made in Hive were not backported to parquet-hive. For example, [the most recent master version of parquet-hive|https://github.com/apache/parquet-mr/blob/04f524d5ad91b1cdda66dfde4089f2f83f4528aa/parquet-hive/parquet-hive-storage-handler/src/main/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java] doesn't support writing maps, decimals, or timestamps, while all these data types are [supported in Hive 1.2.1|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237] and earlier versions. Map instances with null keys are not written to Parquet tables -- Key: HIVE-11625 URL: https://issues.apache.org/jira/browse/HIVE-11625 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1, 1.0.1, 1.1.1, 1.2.1 Reporter: Cheng Lian Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are dropped: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} This is because entries with null keys are explicitly skipped in {{DataWritableWriter}}, [see here|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11176) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11176: Summary: Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object; (was: aused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object;) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object; Key: HIVE-11176 URL: https://issues.apache.org/jira/browse/HIVE-11176 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 1.0.0, 1.2.0 Environment: Hive 1.2 and TEz 0.7 Reporter: Soundararajan Velu Priority: Critical Attachments: HIVE-11176.1.patch.txt Unreachable code: hive/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java // With Data @Override @SuppressWarnings(unchecked) public Object getStructFieldData(Object data, StructField fieldRef) { if (data == null) { return null; } // We support both ListObject and Object[] // so we have to do differently. boolean isArray = ! (data instanceof List); if (!isArray !(data instanceof List)) { return data; } * The if condition above translates to if(!true true) the code section cannot be reached, this causes a lot of class cast exceptions while using Tez or ORC file formats or custom jsonsede, Strangely this happens only while using Tez. Changed the code to boolean isArray = data.getClass().isArray(); if (!isArray !(data instanceof List)) { return data; } Even then, lazystructs get passed as fields causing downstream cast exceptions like lazystruct cannot be cast to Text etc... So I changed the method to something like this, // With Data @Override @SuppressWarnings(unchecked) public Object getStructFieldData(Object data, StructField fieldRef) { if (data == null) { return null; } if (data instanceof LazyBinaryStruct) { data = ((LazyBinaryStruct) data).getFieldsAsList(); } // We support both ListObject and Object[] // so we have to do differently. boolean isArray = data.getClass().isArray(); if (!isArray !(data instanceof List)) { return data; } This is causing arrayindexout of bounds exception and other typecast exceptions in object inspectors, Please help, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708908#comment-14708908 ] Ashutosh Chauhan commented on HIVE-10631: - +1 create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4
[ https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709013#comment-14709013 ] Hive QA commented on HIVE-11383: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12751984/HIVE-11383.11.patch {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 9377 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_mapjoin_reduce {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5050/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5050/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5050/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12751984 - PreCommit-HIVE-TRUNK-Build Upgrade Hive to Calcite 1.4 --- Key: HIVE-11383 URL: https://issues.apache.org/jira/browse/HIVE-11383 Project: Hive Issue Type: Bug Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11383.1.patch, HIVE-11383.10.patch, HIVE-11383.11.patch, HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch CLEAR LIBRARY CACHE Upgrade Hive to Calcite 1.4.0-incubating. There is currently a snapshot release, which is close to what will be in 1.4. I have checked that Hive compiles against the new snapshot, fixing one issue. The patch is attached. Next step is to validate that Hive runs against the new Calcite, and post any issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], can you please do that. [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in the new Calcite version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11624) Beeline-cli: support hive.cli.print.header in new CLI[beeline-cli branch]
[ https://issues.apache.org/jira/browse/HIVE-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709015#comment-14709015 ] Hive QA commented on HIVE-11624: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12751991/HIVE-11624.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5051/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5051/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5051/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5051/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at a16bbd4 HIVE-11176 : Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object; (Navis via Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at a16bbd4 HIVE-11176 : Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object; (Navis via Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12751991 - PreCommit-HIVE-TRUNK-Build Beeline-cli: support hive.cli.print.header in new CLI[beeline-cli branch] - Key: HIVE-11624 URL: https://issues.apache.org/jira/browse/HIVE-11624 Project: Hive Issue Type: Sub-task Reporter: Ke Jia Assignee: Ke Jia Attachments: HIVE-11624.patch In the old CLI, it uses hive.cli.print.header from the hive configuration to force execution a script . We need to support the previous configuration using beeline functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV
[ https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11573: --- Attachment: HIVE-11573.5.patch PointLookupOptimizer can be pessimistic at a low nDV Key: HIVE-11573 URL: https://issues.apache.org/jira/browse/HIVE-11573 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Labels: TODOC2.0 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, HIVE-11573.3.patch, HIVE-11573.4.patch, HIVE-11573.5.patch The PointLookupOptimizer can turn off some of the optimizations due to its use of tuple IN() clauses. Limit the application of the optimizer for very low nDV cases and extract the sub-clause as a pre-condition during runtime, to trigger the simple column predicate index lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV
[ https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709054#comment-14709054 ] Jesus Camacho Rodriguez commented on HIVE-11573: [~gopalv], I added a new test in HIVE-11573.5.patch to verify that partition pruning is working fine. I have a comment about the patch. I think we should not store the original predicate in the Filter operator if {{hive.optimize.point.lookup.extract}} is set to true (line 155 in PointLookupOptimizer). We added that line in HIVE-11461 so we do not get regressions with partition pruner, but with your patch, we shouldn't see that issue if extract is true. What do you think? cc'd [~ashutoshc] PointLookupOptimizer can be pessimistic at a low nDV Key: HIVE-11573 URL: https://issues.apache.org/jira/browse/HIVE-11573 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Labels: TODOC2.0 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, HIVE-11573.3.patch, HIVE-11573.4.patch, HIVE-11573.5.patch The PointLookupOptimizer can turn off some of the optimizations due to its use of tuple IN() clauses. Limit the application of the optimizer for very low nDV cases and extract the sub-clause as a pre-condition during runtime, to trigger the simple column predicate index lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10785) Support aggregate push down through joins
[ https://issues.apache.org/jira/browse/HIVE-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10785: --- Assignee: Ashutosh Chauhan (was: Jesus Camacho Rodriguez) Support aggregate push down through joins - Key: HIVE-10785 URL: https://issues.apache.org/jira/browse/HIVE-10785 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Ashutosh Chauhan Enable {{AggregateJoinTransposeRule}} in CBO that pushes Aggregate through Join operators. The rule has been extended in Calcite 1.4 to cover complex cases e.g. Aggregate operators comprising UDAF. The decision on whether to push the Aggregate through Join or not should be cost-driven. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11625) Map instances with null keys are not properly handled for Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HIVE-11625: -- Summary: Map instances with null keys are not properly handled for Parquet tables (was: Map instances with null keys are not written to Parquet tables) Map instances with null keys are not properly handled for Parquet tables Key: HIVE-11625 URL: https://issues.apache.org/jira/browse/HIVE-11625 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0, 0.13.1, 1.0.1, 1.1.1, 1.2.1 Reporter: Cheng Lian Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are dropped: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} This is because entries with null keys are explicitly skipped in {{DataWritableWriter}}, [see here|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. This issue can be fixed by moving [the value writing block|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L230-L236] out of [the key writing block|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11625) Map instances with null keys are not written to Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated HIVE-11625: -- Description: Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are dropped: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} This is because entries with null keys are explicitly skipped in {{DataWritableWriter}}, [see here|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. This issue can be fixed by moving [the value writing block|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L230-L236] out of [the key writing block|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. was: Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are dropped: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} This is because entries with null keys are explicitly skipped in {{DataWritableWriter}}, [see here|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. Map instances with null keys are not written to Parquet tables -- Key: HIVE-11625 URL: https://issues.apache.org/jira/browse/HIVE-11625 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0, 0.13.1, 1.0.1, 1.1.1, 1.2.1 Reporter: Cheng Lian Hive allows maps with null keys: {code:sql} hive select map(null, 'foo', 1, 'bar', null, 'baz'); {null:baz,1:bar} {code} However, when written into Parquet tables, map entries with null as keys are dropped: {code:sql} hive CREATE TABLE map_test STORED AS PARQUET AS SELECT MAP(null, 'foo', 1, 'bar', null, 'baz'); ... hive SELECT * from map_test; {1:bar} {code} This is because entries with null keys are explicitly skipped in {{DataWritableWriter}}, [see here|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. This issue can be fixed by moving [the value writing block|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L230-L236] out of [the key writing block|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L223-L237]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV
[ https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709112#comment-14709112 ] Hive QA commented on HIVE-11573: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12751994/HIVE-11573.5.patch {color:green}SUCCESS:{color} +1 9379 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5052/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5052/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5052/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12751994 - PreCommit-HIVE-TRUNK-Build PointLookupOptimizer can be pessimistic at a low nDV Key: HIVE-11573 URL: https://issues.apache.org/jira/browse/HIVE-11573 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Labels: TODOC2.0 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, HIVE-11573.3.patch, HIVE-11573.4.patch, HIVE-11573.5.patch The PointLookupOptimizer can turn off some of the optimizations due to its use of tuple IN() clauses. Limit the application of the optimizer for very low nDV cases and extract the sub-clause as a pre-condition during runtime, to trigger the simple column predicate index lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9583) Rolling upgrade of Hive MetaStore Server
[ https://issues.apache.org/jira/browse/HIVE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan resolved HIVE-9583. Resolution: Fixed Fix Version/s: 1.2.2 (Marking as fixed on the 1.2 line, since per Thiruvel, all the tasks inside this are done, and were done as of 1.2.0) Rolling upgrade of Hive MetaStore Server Key: HIVE-9583 URL: https://issues.apache.org/jira/browse/HIVE-9583 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.14.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Labels: hcatalog, metastore Fix For: 1.2.2 This is an umbrella JIRA to track all rolling upgrade JIRAs w.r.t MetaStore server. This will be helpful for users deploying Metastore server and connecting to it with HCatalog or Hive CLI interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11628) DB type detection code is failing on Oracle 12
[ https://issues.apache.org/jira/browse/HIVE-11628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709939#comment-14709939 ] Sergey Shelukhin commented on HIVE-11628: - +1 DB type detection code is failing on Oracle 12 -- Key: HIVE-11628 URL: https://issues.apache.org/jira/browse/HIVE-11628 Project: Hive Issue Type: Bug Components: Metastore Environment: Oracle 12 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 2.0.0 Attachments: HIVE-11628.patch DB type detection code is failing when using Oracle 12 as backing store. When determining qualification for direct SQL, in the logs following message is seen: {noformat} 2015-08-14 01:15:16,020 INFO [pool-6-thread-109]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:init(131)) - Using direct SQL, underlying DB is OTHER {noformat} Currently in org/apache/hadoop/hive/metastore/MetaStoreDirectSql, there is a code snippet: {code} private DB determineDbType() { DB dbType = DB.OTHER; if (runDbCheck(SET @@session.sql_mode=ANSI_QUOTES, MySql)) { dbType = DB.MYSQL; } else if (runDbCheck(SELECT version from v$instance, Oracle)) { dbType = DB.ORACLE; } else if (runDbCheck(SELECT @@version, MSSQL)) { dbType = DB.MSSQL; } else { // TODO: maybe we should use getProductName to identify all the DBs String productName = getProductName(); if (productName != null productName.toLowerCase().contains(derby)) { dbType = DB.DERBY; } } return dbType; } {code} The code relies on access to v$instance in order to identify the backend DB as Oracle, but this can fail if users are not granted select privileges on v$ tables. An alternate way is specified on [Oracle Database Reference pages|http://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_4224.htm] works. I will attach a potential patch that should work. Without the patch the workaround here would be to grant select privileges on v$ tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11628) DB type detection code is failing on Oracle 12
[ https://issues.apache.org/jira/browse/HIVE-11628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710019#comment-14710019 ] Ashutosh Chauhan commented on HIVE-11628: - I think HIVE-11123 has a better fix for this. DB type detection code is failing on Oracle 12 -- Key: HIVE-11628 URL: https://issues.apache.org/jira/browse/HIVE-11628 Project: Hive Issue Type: Bug Components: Metastore Environment: Oracle 12 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 2.0.0 Attachments: HIVE-11628.patch DB type detection code is failing when using Oracle 12 as backing store. When determining qualification for direct SQL, in the logs following message is seen: {noformat} 2015-08-14 01:15:16,020 INFO [pool-6-thread-109]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:init(131)) - Using direct SQL, underlying DB is OTHER {noformat} Currently in org/apache/hadoop/hive/metastore/MetaStoreDirectSql, there is a code snippet: {code} private DB determineDbType() { DB dbType = DB.OTHER; if (runDbCheck(SET @@session.sql_mode=ANSI_QUOTES, MySql)) { dbType = DB.MYSQL; } else if (runDbCheck(SELECT version from v$instance, Oracle)) { dbType = DB.ORACLE; } else if (runDbCheck(SELECT @@version, MSSQL)) { dbType = DB.MSSQL; } else { // TODO: maybe we should use getProductName to identify all the DBs String productName = getProductName(); if (productName != null productName.toLowerCase().contains(derby)) { dbType = DB.DERBY; } } return dbType; } {code} The code relies on access to v$instance in order to identify the backend DB as Oracle, but this can fail if users are not granted select privileges on v$ tables. An alternate way is specified on [Oracle Database Reference pages|http://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_4224.htm] works. I will attach a potential patch that should work. Without the patch the workaround here would be to grant select privileges on v$ tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11629) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter expressions for full outer join and right outer join
[ https://issues.apache.org/jira/browse/HIVE-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11629: --- Attachment: HIVE-11629.01.patch CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter expressions for full outer join and right outer join -- Key: HIVE-11629 URL: https://issues.apache.org/jira/browse/HIVE-11629 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11629.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11422) Join a ACID table with non-ACID table fail with MR
[ https://issues.apache.org/jira/browse/HIVE-11422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710037#comment-14710037 ] Hive QA commented on HIVE-11422: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752065/HIVE-11422.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9378 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5054/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5054/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5054/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752065 - PreCommit-HIVE-TRUNK-Build Join a ACID table with non-ACID table fail with MR -- Key: HIVE-11422 URL: https://issues.apache.org/jira/browse/HIVE-11422 Project: Hive Issue Type: Bug Components: Query Processor, Transactions Affects Versions: 1.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11422.1.patch The following script fail on MR mode: {code} CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) CLUSTERED BY (k1) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES(transactional=true); INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I'); CREATE TABLE orc_table (k1 INT, f1 STRING) CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS STORED AS ORC; INSERT OVERWRITE TABLE orc_table VALUES (1, 'x'); SET hive.execution.engine=mr; SET hive.auto.convert.join=false; SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; SELECT t1.*, t2.* FROM orc_table t1 JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1; {code} Stack: {code} Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:701) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:169) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.AcidUtils.deserializeDeltas(AcidUtils.java:368) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1211) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1129) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249) ... 9 more {code} The script pass in 1.2.0 release however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow
[ https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710074#comment-14710074 ] Aihua Xu commented on HIVE-11617: - I will break the task into runtime performance change and explain output change (in subtasks) so that I can make sure the run time change should not have different affects on the results but the performance. Explain plan for multiple lateral views is very slow Key: HIVE-11617 URL: https://issues.apache.org/jira/browse/HIVE-11617 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11617.patch The following explain job will be very slow or never finish if there are many lateral views involved. High CPU usage is also noticed. {noformat} EXPLAIN SELECT * from ( SELECT * FROM table1 ) x LATERAL VIEW json_tuple(...) x1 LATERAL VIEW json_tuple(...) x2 ... {noformat} From jstack, the job is busy with preorder tree traverse. {noformat} at java.util.regex.Matcher.getTextLength(Matcher.java:1234) at java.util.regex.Matcher.reset(Matcher.java:308) at java.util.regex.Matcher.init(Matcher.java:228) at java.util.regex.Pattern.matcher(Pattern.java:1088) at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61) at
[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.
[ https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710109#comment-14710109 ] Deepesh Khandelwal commented on HIVE-11123: --- [~sinchii] thanks for the patch! What version of Oracle did you use for testing this? We saw some issues with existing code (without your patch) on Oracle 12 whereas it worked fine on Oracle 11. Just want to make sure that it works as expected between different Oracle versions. Fix how to confirm the RDBMS product name at Metastore. --- Key: HIVE-11123 URL: https://issues.apache.org/jira/browse/HIVE-11123 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.0 Environment: PostgreSQL Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch, HIVE-11123.3.patch, HIVE-11123.4.patch I use PostgreSQL to Hive Metastore. And I saw the following message at PostgreSQL log. {code} 2015-06-26 10:58:15.488 JST ERROR: syntax error at or near @@ at character 5 2015-06-26 10:58:15.488 JST STATEMENT: SET @@session.sql_mode=ANSI_QUOTES 2015-06-26 10:58:15.489 JST ERROR: relation v$instance does not exist at character 21 2015-06-26 10:58:15.489 JST STATEMENT: SELECT version FROM v$instance 2015-06-26 10:58:15.490 JST ERROR: column version does not exist at character 10 2015-06-26 10:58:15.490 JST STATEMENT: SELECT @@version {code} When Hive CLI and Beeline embedded mode are carried out, this message is output to PostgreSQL log. These queries are called from MetaStoreDirectSql#determineDbType. And if we use MetaStoreDirectSql#getProductName, we need not to call these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11628) DB type detection code is failing on Oracle 12
[ https://issues.apache.org/jira/browse/HIVE-11628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710111#comment-14710111 ] Deepesh Khandelwal commented on HIVE-11628: --- Yes, I agree, HIVE-11123 is a more robust fix. If that works fine between Oracle versions (11, 12), then we don't need this. I have posted a question for [~sinchii] about his test environment. DB type detection code is failing on Oracle 12 -- Key: HIVE-11628 URL: https://issues.apache.org/jira/browse/HIVE-11628 Project: Hive Issue Type: Bug Components: Metastore Environment: Oracle 12 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 2.0.0 Attachments: HIVE-11628.patch DB type detection code is failing when using Oracle 12 as backing store. When determining qualification for direct SQL, in the logs following message is seen: {noformat} 2015-08-14 01:15:16,020 INFO [pool-6-thread-109]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:init(131)) - Using direct SQL, underlying DB is OTHER {noformat} Currently in org/apache/hadoop/hive/metastore/MetaStoreDirectSql, there is a code snippet: {code} private DB determineDbType() { DB dbType = DB.OTHER; if (runDbCheck(SET @@session.sql_mode=ANSI_QUOTES, MySql)) { dbType = DB.MYSQL; } else if (runDbCheck(SELECT version from v$instance, Oracle)) { dbType = DB.ORACLE; } else if (runDbCheck(SELECT @@version, MSSQL)) { dbType = DB.MSSQL; } else { // TODO: maybe we should use getProductName to identify all the DBs String productName = getProductName(); if (productName != null productName.toLowerCase().contains(derby)) { dbType = DB.DERBY; } } return dbType; } {code} The code relies on access to v$instance in order to identify the backend DB as Oracle, but this can fail if users are not granted select privileges on v$ tables. An alternate way is specified on [Oracle Database Reference pages|http://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_4224.htm] works. I will attach a potential patch that should work. Without the patch the workaround here would be to grant select privileges on v$ tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11628) DB type detection code is failing on Oracle 12
[ https://issues.apache.org/jira/browse/HIVE-11628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-11628: -- Attachment: HIVE-11628.patch Attaching the patch for review. DB type detection code is failing on Oracle 12 -- Key: HIVE-11628 URL: https://issues.apache.org/jira/browse/HIVE-11628 Project: Hive Issue Type: Bug Components: Metastore Environment: Oracle 12 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-11628.patch DB type detection code is failing when using Oracle 12 as backing store. When determining qualification for direct SQL, in the logs following message is seen: {noformat} 2015-08-14 01:15:16,020 INFO [pool-6-thread-109]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:init(131)) - Using direct SQL, underlying DB is OTHER {noformat} Currently in org/apache/hadoop/hive/metastore/MetaStoreDirectSql, there is a code snippet: {code} private DB determineDbType() { DB dbType = DB.OTHER; if (runDbCheck(SET @@session.sql_mode=ANSI_QUOTES, MySql)) { dbType = DB.MYSQL; } else if (runDbCheck(SELECT version from v$instance, Oracle)) { dbType = DB.ORACLE; } else if (runDbCheck(SELECT @@version, MSSQL)) { dbType = DB.MSSQL; } else { // TODO: maybe we should use getProductName to identify all the DBs String productName = getProductName(); if (productName != null productName.toLowerCase().contains(derby)) { dbType = DB.DERBY; } } return dbType; } {code} The code relies on access to v$instance in order to identify the backend DB as Oracle, but this can fail if users are not granted select privileges on v$ tables. An alternate way is specified on [Oracle Database Reference pages|http://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_4224.htm] works. I will attach a potential patch that should work. Without the patch the workaround here would be to grant select privileges on v$ tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709976#comment-14709976 ] Siddharth Seth commented on HIVE-11515: --- [~navis] - I think the patch is good to go in except for the log line, which should be an error. However, I don't see this fixing an issue - I believe the condition mentioned is already handled. Up to you if you want to commit this. Still some possible race condition in DynamicPartitionPruner Key: HIVE-11515 URL: https://issues.apache.org/jira/browse/HIVE-11515 Project: Hive Issue Type: Bug Components: Query Processor, Tez Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-11515.1.patch.txt Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11618) Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG
[ https://issues.apache.org/jira/browse/HIVE-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710080#comment-14710080 ] Sergio Peña commented on HIVE-11618: Thanks [~owen.omalley] for the patch. +1 The patch looks good. I understand how you want to keep this code simple by returning one type for a group of same primitive values. Just one small feedback. What about adding some comments to {{CovertAstToSearchArg.getType}} and {{PredicateLeaf.Type}} for future reference about simplicity? Other developers might see this lack of data types, and they will be eager to add those. Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG --- Key: HIVE-11618 URL: https://issues.apache.org/jira/browse/HIVE-11618 Project: Hive Issue Type: Bug Components: Types Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-11618.patch The Parquet binding leaked implementation details into the generic SARG api. Rather than make all users of the SARG api deal with each of the specific types, reunify the INTEGER and LONG types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.
[ https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-11548: Attachment: (was: HIVE-11548.1.patch) HCatLoader should support predicate pushdown. - Key: HIVE-11548 URL: https://issues.apache.org/jira/browse/HIVE-11548 Project: Hive Issue Type: New Feature Components: HCatalog Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats that support predicate pushdown (such as ORC, with {{hive.optimize.index.filter=true}}), one sees that the predicates aren't actually pushed down into the storage layer. The forthcoming patch should allow for filter-pushdown, if any of the partitions being scanned with {{HCatLoader}} support the functionality. The patch should technically allow the same for users of {{HCatInputFormat}}, but I don't currently have a neat interface to build a compound predicate-expression. Will add this separately, if required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11633) import tool should print help by default
[ https://issues.apache.org/jira/browse/HIVE-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-11633: --- Assignee: Sergey Shelukhin import tool should print help by default Key: HIVE-11633 URL: https://issues.apache.org/jira/browse/HIVE-11633 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin It took me a while to figure out that I need to supply some command to make import work, and I had to read the sources... it should output help by default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.
[ https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710312#comment-14710312 ] Vaibhav Gumashta commented on HIVE-11581: - Failure is unrelated. HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string. --- Key: HIVE-11581 URL: https://issues.apache.org/jira/browse/HIVE-11581 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 1.3.0, 2.0.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-11581.1.patch, HIVE-11581.2.patch, HIVE-11581.3.patch, HIVE-11581.3.patch, HIVE-11581.4.patch Currently, the client needs to specify several parameters based on which an appropriate connection is created with the server. In case of dynamic service discovery, when multiple HS2 instances are running, it is much more usable for the server to add its config parameters to ZK which the driver can use to configure the connection, instead of the jdbc/odbc user adding those in connection string. However, at minimum, client will need to specify zookeeper ensemble and that she wants the JDBC driver to use ZooKeeper: {noformat} beeline !connect jdbc:hive2://vgumashta.local:2181,vgumashta.local:2182,vgumashta.local:2183/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 vgumashta vgumashta org.apache.hive.jdbc.HiveDriver {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11599) Add metastore command to dump it's configs
[ https://issues.apache.org/jira/browse/HIVE-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710322#comment-14710322 ] Thejas M Nair commented on HIVE-11599: -- [~ekoifman] Would writing the hiveconfig to logs on metastore startup meet the needs ? Add metastore command to dump it's configs -- Key: HIVE-11599 URL: https://issues.apache.org/jira/browse/HIVE-11599 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 1.0.0 Reporter: Eugene Koifman We should have equivalent of Hive CLI set command on Metastore (and likely HS2) which can dump out all properties this particular process is running with. cc [~thejas] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11634: - Component/s: CBO Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) -- Key: HIVE-11634 URL: https://issues.apache.org/jira/browse/HIVE-11634 Project: Hive Issue Type: Bug Components: CBO Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Currently, we do not support partition pruning for the following scenario {code} create table pcr_t1 (key int, value string) partitioned by (ds string); insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key 20 order by key; explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2)); {code} If we run the above query, we see that all the partitions of table pcr_t1 are present in the filter predicate where as we can prune partition (ds='2000-04-10'). The optimization is to rewrite the above query into 2 IN clauses one containing partition columns and the other containing non-partition columns as follows. {code} explain extended select ds from pcr_t1 where (struct(key) IN (struct(1), struct(2))) and (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')); {code} This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11634: - Attachment: HIVE-11634.1.patch Initial draft, more test cases to follow in patch#2. Lets see how the test runs go with patch#1 Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) -- Key: HIVE-11634 URL: https://issues.apache.org/jira/browse/HIVE-11634 Project: Hive Issue Type: Bug Components: CBO Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11634.1.patch Currently, we do not support partition pruning for the following scenario {code} create table pcr_t1 (key int, value string) partitioned by (ds string); insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key 20 order by key; explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2)); {code} If we run the above query, we see that all the partitions of table pcr_t1 are present in the filter predicate where as we can prune partition (ds='2000-04-10'). The optimization is to rewrite the above query into 2 IN clauses one containing partition columns and the other containing non-partition columns as follows. {code} explain extended select ds from pcr_t1 where (struct(key) IN (struct(1), struct(2))) and (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')); {code} This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem
[ https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710413#comment-14710413 ] Laljo John Pullokkaran commented on HIVE-11614: --- [~pxiong] 1) We need to find why the col[1] ended up as fully qualified? 2) Does this happen only for ret path and not for CBO disabled? CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem - Key: HIVE-11614 URL: https://issues.apache.org/jira/browse/HIVE-11614 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11614.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11622) Creating an Avro table with a complex map-typed column leads to incorrect column type.
[ https://issues.apache.org/jira/browse/HIVE-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HIVE-11622. Resolution: Duplicate This is a duplicate of HIVE-11288 which is already fixed. Thanks. Creating an Avro table with a complex map-typed column leads to incorrect column type. -- Key: HIVE-11622 URL: https://issues.apache.org/jira/browse/HIVE-11622 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 1.1.0 Reporter: Alexander Behm Assignee: Jimmy Xiang Labels: AvroSerde In the following CREATE TABLE the following map-typed column leads to the wrong type. I suspect some problem with inferring the Avro schema from the column definitions, but I am not sure. Reproduction: {code} hive create table t (c mapstring,arrayint) stored as avro; OK Time taken: 0.101 seconds hive desc t; OK c arraymapstring,int from deserializer Time taken: 0.135 seconds, Fetched: 1 row(s) {code} Note how the type shown in DESCRIBE is not the type originally passed in the CREATE TABLE. However, *sometimes* the DESCRIBE shows the correct output. You may also try these steps which produce a similar problem to increase the chance of hitting this issue: {code} hive create table t (c arraymapstring,int) stored as avro; OK Time taken: 0.063 seconds hive desc t; OK c mapstring,arrayint from deserializer Time taken: 0.152 seconds, Fetched: 1 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710245#comment-14710245 ] Sergey Shelukhin commented on HIVE-11595: - [~prasanth_j] failures are unrelated :) refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8007) Clean up Thrift definitions
[ https://issues.apache.org/jira/browse/HIVE-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710182#comment-14710182 ] Lars Francke commented on HIVE-8007: Oh...no... I did not regenerate the files as I could not get Thrift to build :( At least I was correct about the build failure being unrelated ;-) I'll try again to get Thrift working, thanks for the heads up! Clean up Thrift definitions --- Key: HIVE-8007 URL: https://issues.apache.org/jira/browse/HIVE-8007 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-8007.1.patch, HIVE-8007.2.patch, HIVE-8007.3.patch This patch changes the following: * Currently the thrift file uses {{//}} to denote comments. Thrift understands the {{/** ... */}} syntax and converts that into documentation in the generated code. This patch changes the syntax * Change tabs to spaces * Consistent indentation * Minor whitespace and/or formatting issues There should be no changes to functionality at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.
[ https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-11548: Attachment: HIVE-11548.1.patch Corrected the bad-code. Submitting for re-test. HCatLoader should support predicate pushdown. - Key: HIVE-11548 URL: https://issues.apache.org/jira/browse/HIVE-11548 Project: Hive Issue Type: New Feature Components: HCatalog Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-11548.1.patch When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats that support predicate pushdown (such as ORC, with {{hive.optimize.index.filter=true}}), one sees that the predicates aren't actually pushed down into the storage layer. The forthcoming patch should allow for filter-pushdown, if any of the partitions being scanned with {{HCatLoader}} support the functionality. The patch should technically allow the same for users of {{HCatInputFormat}}, but I don't currently have a neat interface to build a compound predicate-expression. Will add this separately, if required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11630) Use string intering with HiveConf to be more space efficient
[ https://issues.apache.org/jira/browse/HIVE-11630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-11630: Description: The giant enum HiveConf#ConfVars has String fields varname and description. Each new session on HiveServer2 creates a new conf object, which means wastefully creating the varname and description string for each enum object. (was: The giant enum HiveConf#ConfVars has a String field varname. Each new session on HiveServer2 creates a new conf object, which means wastefully creating the varname string for each enum object. ) Use string intering with HiveConf to be more space efficient - Key: HIVE-11630 URL: https://issues.apache.org/jira/browse/HIVE-11630 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.1.1 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta The giant enum HiveConf#ConfVars has String fields varname and description. Each new session on HiveServer2 creates a new conf object, which means wastefully creating the varname and description string for each enum object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11635) import tool fails on unsecure cluster
[ https://issues.apache.org/jira/browse/HIVE-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710336#comment-14710336 ] Sergey Shelukhin commented on HIVE-11635: - Running with --all. Everything before that was imported, but still the exception should not be output import tool fails on unsecure cluster - Key: HIVE-11635 URL: https://issues.apache.org/jira/browse/HIVE-11635 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin {noformat} Copying kerberos related items 2015-08-24 20:28:51,292 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDelegationToken and subclasses resulted in no possible candidates Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:485) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3380) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954) at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679) at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370) at org.datanucleus.store.query.Query.executeQuery(Query.java:1744) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.store.query.Query.execute(Query.java:1654) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221) at org.apache.hadoop.hive.metastore.ObjectStore.getAllTokenIdentifiers(ObjectStore.java:6888) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.copyKerberos(HBaseImport.java:474) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.run(HBaseImport.java:249) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.main(HBaseImport.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:222) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} ... {noformat} 2015-08-24 20:28:51,298 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MMasterKey and subclasses resulted in no possible candidates Required table missing : `MASTER_KEYS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.da {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11635) import tool fails on non-secure cluster
[ https://issues.apache.org/jira/browse/HIVE-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710363#comment-14710363 ] Alan Gates commented on HIVE-11635: --- Do you want to fix this or want me to? I can, but it will be a few days before I get to it. import tool fails on non-secure cluster --- Key: HIVE-11635 URL: https://issues.apache.org/jira/browse/HIVE-11635 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin {noformat} Copying kerberos related items 2015-08-24 20:28:51,292 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDelegationToken and subclasses resulted in no possible candidates Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:485) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3380) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954) at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679) at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370) at org.datanucleus.store.query.Query.executeQuery(Query.java:1744) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.store.query.Query.execute(Query.java:1654) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221) at org.apache.hadoop.hive.metastore.ObjectStore.getAllTokenIdentifiers(ObjectStore.java:6888) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.copyKerberos(HBaseImport.java:474) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.run(HBaseImport.java:249) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.main(HBaseImport.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:222) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} ... {noformat} 2015-08-24 20:28:51,298 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MMasterKey and subclasses resulted in no possible candidates Required table missing : `MASTER_KEYS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.da {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11629) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter expressions for full outer join and right outer join
[ https://issues.apache.org/jira/browse/HIVE-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710424#comment-14710424 ] Laljo John Pullokkaran commented on HIVE-11629: --- [~jcamachorodriguez] Could you take a look at this one first? CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter expressions for full outer join and right outer join -- Key: HIVE-11629 URL: https://issues.apache.org/jira/browse/HIVE-11629 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11629.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11634: - Description: Currently, we do not support partition pruning for the following scenario {code} create table pcr_t1 (key int, value string) partitioned by (ds string); insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key 20 order by key; explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2)); {code} If we run the above query, we see that all the partitions of table pcr_t1 are present in the filter predicate where as we can prune partition (ds='2000-04-10'). The optimization is to rewrite the above query into 2 IN clauses one containing partition columns and the other containing non-partition columns as follows. {code} explain extended select ds from pcr_t1 where (struct(key) IN (struct(1), struct(2))) and (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')); {code} This is an extension of the idea presented in HIVE-11573. Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) -- Key: HIVE-11634 URL: https://issues.apache.org/jira/browse/HIVE-11634 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Currently, we do not support partition pruning for the following scenario {code} create table pcr_t1 (key int, value string) partitioned by (ds string); insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key 20 order by key; explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2)); {code} If we run the above query, we see that all the partitions of table pcr_t1 are present in the filter predicate where as we can prune partition (ds='2000-04-10'). The optimization is to rewrite the above query into 2 IN clauses one containing partition columns and the other containing non-partition columns as follows. {code} explain extended select ds from pcr_t1 where (struct(key) IN (struct(1), struct(2))) and (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')); {code} This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11633) import tool should print help by default
[ https://issues.apache.org/jira/browse/HIVE-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710360#comment-14710360 ] Alan Gates commented on HIVE-11633: --- +1 import tool should print help by default Key: HIVE-11633 URL: https://issues.apache.org/jira/browse/HIVE-11633 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11633.patch It took me a while to figure out that I need to supply some command to make import work, and I had to read the sources... it should output help by default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11633) import tool should print help by default
[ https://issues.apache.org/jira/browse/HIVE-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11633: Attachment: HIVE-11633.patch import tool should print help by default Key: HIVE-11633 URL: https://issues.apache.org/jira/browse/HIVE-11633 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11633.patch It took me a while to figure out that I need to supply some command to make import work, and I had to read the sources... it should output help by default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11633) import tool should print help by default
[ https://issues.apache.org/jira/browse/HIVE-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710302#comment-14710302 ] Sergey Shelukhin commented on HIVE-11633: - [~alangates] fyi import tool should print help by default Key: HIVE-11633 URL: https://issues.apache.org/jira/browse/HIVE-11633 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11633.patch It took me a while to figure out that I need to supply some command to make import work, and I had to read the sources... it should output help by default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.
[ https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710317#comment-14710317 ] Thejas M Nair commented on HIVE-11581: -- +1 for new patch HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string. --- Key: HIVE-11581 URL: https://issues.apache.org/jira/browse/HIVE-11581 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 1.3.0, 2.0.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-11581.1.patch, HIVE-11581.2.patch, HIVE-11581.3.patch, HIVE-11581.3.patch, HIVE-11581.4.patch Currently, the client needs to specify several parameters based on which an appropriate connection is created with the server. In case of dynamic service discovery, when multiple HS2 instances are running, it is much more usable for the server to add its config parameters to ZK which the driver can use to configure the connection, instead of the jdbc/odbc user adding those in connection string. However, at minimum, client will need to specify zookeeper ensemble and that she wants the JDBC driver to use ZooKeeper: {noformat} beeline !connect jdbc:hive2://vgumashta.local:2181,vgumashta.local:2182,vgumashta.local:2183/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 vgumashta vgumashta org.apache.hive.jdbc.HiveDriver {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11599) Add metastore command to dump it's configs
[ https://issues.apache.org/jira/browse/HIVE-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710327#comment-14710327 ] Eugene Koifman commented on HIVE-11599: --- That would be a good start but maybe not enough. Most of the time logs are rolled and archived and what you get from customers is logs from today, not when Metastore was launched. So having this on-demand is better. Add metastore command to dump it's configs -- Key: HIVE-11599 URL: https://issues.apache.org/jira/browse/HIVE-11599 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 1.0.0 Reporter: Eugene Koifman We should have equivalent of Hive CLI set command on Metastore (and likely HS2) which can dump out all properties this particular process is running with. cc [~thejas] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11635) import tool fails on unsecure cluster
[ https://issues.apache.org/jira/browse/HIVE-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11635: Description: {noformat} Copying kerberos related items 2015-08-24 20:28:51,292 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDelegationToken and subclasses resulted in no possible candidates Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:485) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3380) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954) at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679) at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370) at org.datanucleus.store.query.Query.executeQuery(Query.java:1744) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.store.query.Query.execute(Query.java:1654) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221) at org.apache.hadoop.hive.metastore.ObjectStore.getAllTokenIdentifiers(ObjectStore.java:6888) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.copyKerberos(HBaseImport.java:474) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.run(HBaseImport.java:249) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.main(HBaseImport.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:222) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} ... {noformat} 2015-08-24 20:28:51,298 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MMasterKey and subclasses resulted in no possible candidates Required table missing : `MASTER_KEYS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.da {noformat} was: {noformat} Copying kerberos related items 2015-08-24 20:28:51,292 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDelegationToken and subclasses resulted in no possible candidates Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:485) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3380) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) at
[jira] [Updated] (HIVE-11635) import tool fails on non-secure cluster
[ https://issues.apache.org/jira/browse/HIVE-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11635: Summary: import tool fails on non-secure cluster (was: import tool fails on unsecure cluster) import tool fails on non-secure cluster --- Key: HIVE-11635 URL: https://issues.apache.org/jira/browse/HIVE-11635 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin {noformat} Copying kerberos related items 2015-08-24 20:28:51,292 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDelegationToken and subclasses resulted in no possible candidates Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : `DELEGATION_TOKENS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:485) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3380) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954) at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679) at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370) at org.datanucleus.store.query.Query.executeQuery(Query.java:1744) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.store.query.Query.execute(Query.java:1654) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221) at org.apache.hadoop.hive.metastore.ObjectStore.getAllTokenIdentifiers(ObjectStore.java:6888) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.copyKerberos(HBaseImport.java:474) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.run(HBaseImport.java:249) at org.apache.hadoop.hive.metastore.hbase.HBaseImport.main(HBaseImport.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:222) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} ... {noformat} 2015-08-24 20:28:51,298 WARN [main] DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MMasterKey and subclasses resulted in no possible candidates Required table missing : `MASTER_KEYS` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.da {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
[ https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11504: Attachment: HIVE-11504.1.patch Hi [~owen.omalley] [~spena], let's use the first edition to resolve this jira. Any thoughts? Predicate pushing down doesn't work for float type for Parquet -- Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11504.1.patch, HIVE-11504.1.patch, HIVE-11504.2.patch, HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11624) Beeline-cli: support hive.cli.print.header in new CLI[beeline-cli branch]
[ https://issues.apache.org/jira/browse/HIVE-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11624: Attachment: HIVE-11624.1-beeline-cli.patch rename the patch with branch name Beeline-cli: support hive.cli.print.header in new CLI[beeline-cli branch] - Key: HIVE-11624 URL: https://issues.apache.org/jira/browse/HIVE-11624 Project: Hive Issue Type: Sub-task Reporter: Ke Jia Assignee: Ke Jia Attachments: HIVE-11624.1-beeline-cli.patch, HIVE-11624.patch In the old CLI, it uses hive.cli.print.header from the hive configuration to force execution a script . We need to support the previous configuration using beeline functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11445) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby distinct does not work
[ https://issues.apache.org/jira/browse/HIVE-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-11445: - Assignee: Laljo John Pullokkaran (was: Pengcheng Xiong) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby distinct does not work - Key: HIVE-11445 URL: https://issues.apache.org/jira/browse/HIVE-11445 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Laljo John Pullokkaran Attachments: HIVE-11445.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11623) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix the tableAlias for ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710418#comment-14710418 ] Laljo John Pullokkaran commented on HIVE-11623: --- [~jcamachorodriguez] Could you look at this one first? CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix the tableAlias for ReduceSink operator Key: HIVE-11623 URL: https://issues.apache.org/jira/browse/HIVE-11623 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11623.01.patch, HIVE-11623.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-11611) A bad performance regression issue with Parquet happens if Hive does not select any columns
[ https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reopened HIVE-11611: - Hi [~spena], I think if we bump up the latest version of parquet, we still need to change the code to the original one. I'd like to reopen this jira. A bad performance regression issue with Parquet happens if Hive does not select any columns --- Key: HIVE-11611 URL: https://issues.apache.org/jira/browse/HIVE-11611 Project: Hive Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Sergio Peña Assignee: Ferdinand Xu Attachments: HIVE-11611.patch A possible performance issue may happen with the below code when using a query like this {{SELECT count(1) FROM parquetTable}}. {code} if (!ColumnProjectionUtils.isReadAllColumns(configuration) !indexColumnsWanted.isEmpty()) { MessageType requestedSchemaByUser = getSchemaByIndex(tableSchema, columnNamesList, indexColumnsWanted); return new ReadContext(requestedSchemaByUser, contextMetadata); } else { return new ReadContext(tableSchema, contextMetadata); } {code} If there are not columns nor indexes selected, then the above code will read the full schema from Parquet even if Hive does not do anything with such values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11628) DB type detection code is failing on Oracle 12
[ https://issues.apache.org/jira/browse/HIVE-11628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710432#comment-14710432 ] Hive QA commented on HIVE-11628: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752075/HIVE-11628.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9377 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5057/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5057/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5057/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752075 - PreCommit-HIVE-TRUNK-Build DB type detection code is failing on Oracle 12 -- Key: HIVE-11628 URL: https://issues.apache.org/jira/browse/HIVE-11628 Project: Hive Issue Type: Bug Components: Metastore Environment: Oracle 12 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 2.0.0 Attachments: HIVE-11628.patch DB type detection code is failing when using Oracle 12 as backing store. When determining qualification for direct SQL, in the logs following message is seen: {noformat} 2015-08-14 01:15:16,020 INFO [pool-6-thread-109]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:init(131)) - Using direct SQL, underlying DB is OTHER {noformat} Currently in org/apache/hadoop/hive/metastore/MetaStoreDirectSql, there is a code snippet: {code} private DB determineDbType() { DB dbType = DB.OTHER; if (runDbCheck(SET @@session.sql_mode=ANSI_QUOTES, MySql)) { dbType = DB.MYSQL; } else if (runDbCheck(SELECT version from v$instance, Oracle)) { dbType = DB.ORACLE; } else if (runDbCheck(SELECT @@version, MSSQL)) { dbType = DB.MSSQL; } else { // TODO: maybe we should use getProductName to identify all the DBs String productName = getProductName(); if (productName != null productName.toLowerCase().contains(derby)) { dbType = DB.DERBY; } } return dbType; } {code} The code relies on access to v$instance in order to identify the backend DB as Oracle, but this can fail if users are not granted select privileges on v$ tables. An alternate way is specified on [Oracle Database Reference pages|http://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_4224.htm] works. I will attach a potential patch that should work. Without the patch the workaround here would be to grant select privileges on v$ tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11631) TFetchResultsResp hasMoreRows and startRowOffset not returning actual values
[ https://issues.apache.org/jira/browse/HIVE-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jenny Kim updated HIVE-11631: - Description: hasMoreRows always returns False startRowOffset always appears to be 0 was: This was originally reported in https://jira.cloudera.com/browse/CDH-8904 but appears to still be broken. hasMoreRows always returns False, and startRowOffset always appears to be 0 Summary: TFetchResultsResp hasMoreRows and startRowOffset not returning actual values (was: TFetchResultsResp hasMoreRow and startRowOffset not returning actual values) TFetchResultsResp hasMoreRows and startRowOffset not returning actual values Key: HIVE-11631 URL: https://issues.apache.org/jira/browse/HIVE-11631 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.1.1 Reporter: Jenny Kim hasMoreRows always returns False startRowOffset always appears to be 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
[ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710168#comment-14710168 ] Hive QA commented on HIVE-11595: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752073/HIVE-11595.03.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9377 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5055/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5055/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5055/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752073 - PreCommit-HIVE-TRUNK-Build refactor ORC footer reading to make it usable from outside -- Key: HIVE-11595 URL: https://issues.apache.org/jira/browse/HIVE-11595 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10595.patch, HIVE-11595.01.patch, HIVE-11595.02.patch, HIVE-11595.03.patch If ORC footer is read from cache, we want to parse it without having the reader, opening a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer), or similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710436#comment-14710436 ] Navis commented on HIVE-11515: -- [~sseth] If it's already fixed, seemed not need to commit this. Thanks! Still some possible race condition in DynamicPartitionPruner Key: HIVE-11515 URL: https://issues.apache.org/jira/browse/HIVE-11515 Project: Hive Issue Type: Bug Components: Query Processor, Tez Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-11515.1.patch.txt Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to reproduce but it seemed related to the fact that prune() is called by thread-pool. With some delay in queue, events from fast tasks are arrived before prune() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11357) ACID enable predicate pushdown for insert-only delta file 2
[ https://issues.apache.org/jira/browse/HIVE-11357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11357: -- Attachment: HIVE-11357.patch ACID enable predicate pushdown for insert-only delta file 2 --- Key: HIVE-11357 URL: https://issues.apache.org/jira/browse/HIVE-11357 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11357.patch HIVE-11320 missed a case. That fix enabled PPD for insert-only delta files when a base file is present. It won't work if only delta files are present. see {{OrcInputFormat.getReader(InputSplit inputSplit, Options options)}} which only calls {{setSearchArgument()}} if there is a base file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10215) Large IN() clauses: deep hashCode performance during optimizer pass
[ https://issues.apache.org/jira/browse/HIVE-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10215: -- Fix Version/s: 1.2.0 Large IN() clauses: deep hashCode performance during optimizer pass --- Key: HIVE-10215 URL: https://issues.apache.org/jira/browse/HIVE-10215 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.2.0 Attachments: HIVE-10215.1.patch The logical optimizer uses several maps and sets, which are exceeding expensive for large IN() clauses due to the fact that several part of the queries walk over the lists without short-circuiting during hashCode(), while equals() is faster due to short-circuiting via less expensive operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10163: -- Fix Version/s: 1.2.0 CommonMergeJoinOperator calls WritableComparator.get() in the inner loop Key: HIVE-10163 URL: https://issues.apache.org/jira/browse/HIVE-10163 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gunther Hagleitner Labels: JOIN, Performance Fix For: 1.2.0 Attachments: HIVE-10163.1.patch, HIVE-10163.2.patch, HIVE-10163.3.patch, mergejoin-comparekeys.png, mergejoin-parallel-bt.png, mergejoin-parallel-lock.png The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} !mergejoin-parallel-lock.png! !mergejoin-comparekeys.png! The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. !mergejoin-parallel-bt.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11599) Add metastore command to dump it's configs
[ https://issues.apache.org/jira/browse/HIVE-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710531#comment-14710531 ] Ashutosh Chauhan edited comment on HIVE-11599 at 8/25/15 3:41 AM: -- I agree with [~ekoifman] logging at startup is bare minimum we can do, but really useful will be to have a command like {{bin/hive --metastore --printConf}} to print configuration of running metastore on console. was (Author: ashutoshc): I agree with [~ekoifman] logging at startup is bare minimum we can do, but really useful will be to have a command like {{bin/hive --metastore --printConf }} to print configuration of running metastore on console. Add metastore command to dump it's configs -- Key: HIVE-11599 URL: https://issues.apache.org/jira/browse/HIVE-11599 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 1.0.0 Reporter: Eugene Koifman We should have equivalent of Hive CLI set command on Metastore (and likely HS2) which can dump out all properties this particular process is running with. cc [~thejas] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11599) Add metastore command to dump it's configs
[ https://issues.apache.org/jira/browse/HIVE-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710531#comment-14710531 ] Ashutosh Chauhan commented on HIVE-11599: - I agree with [~ekoifman] logging at startup is bare minimum we can do, but really useful will be to have a command like {{bin/hive --metastore --printConf }} to print configuration of running metastore on console. Add metastore command to dump it's configs -- Key: HIVE-11599 URL: https://issues.apache.org/jira/browse/HIVE-11599 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 1.0.0 Reporter: Eugene Koifman We should have equivalent of Hive CLI set command on Metastore (and likely HS2) which can dump out all properties this particular process is running with. cc [~thejas] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.
[ https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710592#comment-14710592 ] Lefty Leverenz commented on HIVE-11581: --- Does this need documentation? If so, please add a TODOC1.3 label. (No doc needed for the HiveConf.java changes -- the patch just moves some parameters around in the file.) * [HiveServer2 Clients | https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients] HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string. --- Key: HIVE-11581 URL: https://issues.apache.org/jira/browse/HIVE-11581 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 1.3.0, 2.0.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11581.1.patch, HIVE-11581.2.patch, HIVE-11581.3.patch, HIVE-11581.3.patch, HIVE-11581.4.patch Currently, the client needs to specify several parameters based on which an appropriate connection is created with the server. In case of dynamic service discovery, when multiple HS2 instances are running, it is much more usable for the server to add its config parameters to ZK which the driver can use to configure the connection, instead of the jdbc/odbc user adding those in connection string. However, at minimum, client will need to specify zookeeper ensemble and that she wants the JDBC driver to use ZooKeeper: {noformat} beeline !connect jdbc:hive2://vgumashta.local:2181,vgumashta.local:2182,vgumashta.local:2183/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 vgumashta vgumashta org.apache.hive.jdbc.HiveDriver {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11637) Support hive.cli.print.current.db in new CLI[beeline-cli branch]
[ https://issues.apache.org/jira/browse/HIVE-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-11637: Attachment: HIVE-11637.1-beeline-cli.patch Support hive.cli.print.current.db in new CLI[beeline-cli branch] Key: HIVE-11637 URL: https://issues.apache.org/jira/browse/HIVE-11637 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-11637.1-beeline-cli.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11624) Beeline-cli: support hive.cli.print.header in new CLI[beeline-cli branch]
[ https://issues.apache.org/jira/browse/HIVE-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710566#comment-14710566 ] Hive QA commented on HIVE-11624: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752118/HIVE-11624.1-beeline-cli.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9235 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/21/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/21/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-BEELINE-Build-21/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752118 - PreCommit-HIVE-BEELINE-Build Beeline-cli: support hive.cli.print.header in new CLI[beeline-cli branch] - Key: HIVE-11624 URL: https://issues.apache.org/jira/browse/HIVE-11624 Project: Hive Issue Type: Sub-task Reporter: Ke Jia Assignee: Ke Jia Attachments: HIVE-11624.1-beeline-cli.patch, HIVE-11624.patch In the old CLI, it uses hive.cli.print.header from the hive configuration to force execution a script . We need to support the previous configuration using beeline functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)