[jira] [Created] (HIVE-25865) ALTER RENAME suppresses commitTransaction failure and reports operation success
Matt McCline created HIVE-25865: --- Summary: ALTER RENAME suppresses commitTransaction failure and reports operation success Key: HIVE-25865 URL: https://issues.apache.org/jira/browse/HIVE-25865 Project: Hive Issue Type: Bug Components: Metastore Reporter: Matt McCline Assignee: Matt McCline If the Commit Tx fails, HiveAlterHandler,alterTable does not report an error. It suppresses the issue and returns successfully. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25493) TBLPROPERTIES upper- vs. lower-case confusion
Matt McCline created HIVE-25493: --- Summary: TBLPROPERTIES upper- vs. lower-case confusion Key: HIVE-25493 URL: https://issues.apache.org/jira/browse/HIVE-25493 Project: Hive Issue Type: Bug Affects Versions: 3.1.2 Reporter: Matt McCline Assignee: Matt McCline User confused by ALTER TABLE SET PROPERTIES difference between 'EXTERNAL'='FALSE' (ignored adds 2 properties EXTERNAL and FALSE) and 'external'='false' (transaction error). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25478) Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS
Matt McCline created HIVE-25478: --- Summary: Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS Key: HIVE-25478 URL: https://issues.apache.org/jira/browse/HIVE-25478 Project: Hive Issue Type: Bug Affects Versions: 3.1.0 Reporter: Matt McCline Assignee: Matt McCline The dot staging file (".hive-staging") file is not removed at the end of the ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS operation as it is for say an INSERT that does automatic statistics collection. I expected it would be deleted after the Stats Work stage. Any ideas where in the code to add automatic deletion (hook)? hdfs dfs -ls /hive/warehouse/managed/table_orc Found 2 items drwxr-xr-x - hive supergroup 0 2021-08-24 17:19 /hive/warehouse/managed/table_orc/.hive-staging_hive_2021-08-24_17-19-17_228_4856027533912221506-7 drwxr-xr-x - hive supergroup 0 2021-08-24 07:17 /hive/warehouse/managed/table_orc/delta_001_001_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two
Matt McCline created HIVE-25446: --- Summary: VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two Key: HIVE-25446 URL: https://issues.apache.org/jira/browse/HIVE-25446 Project: Hive Issue Type: Bug Environment: Encountered this in a very large query: Caused by: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113) at java.util.concurrent.FutureTask.run(FutureTask.java:266) Reporter: Matt McCline Assignee: Matt McCline Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25396) Improve uncaught Thread Exception handling in Hive Server 2
Matt McCline created HIVE-25396: --- Summary: Improve uncaught Thread Exception handling in Hive Server 2 Key: HIVE-25396 URL: https://issues.apache.org/jira/browse/HIVE-25396 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Hive's org.apache.hive.service.thrift.ThriftHttpServlet.doPost method does not handle all Exception kinds. This leaves uncaught Exception handling choices to the Jetty HTTP library. We fix that. Also, a Thread.UncaughtExceptionHandler is added to Hive Server 2 so uncaught Exception are handled uniformly, including making them logged and not just printed to stderr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25385) Prevent Hive Server 2 process failures when InterruptedException encountered
Matt McCline created HIVE-25385: --- Summary: Prevent Hive Server 2 process failures when InterruptedException encountered Key: HIVE-25385 URL: https://issues.apache.org/jira/browse/HIVE-25385 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline To prevent Hive Server 2 process failure, wrap InterruptedException with another Exception like MetaException, HiveSQLException, etc. Otherwise, InterruptedException rises to Thread.run and kills the process. Example of problem stack trace: java.lang.reflect.UndeclaredThrowableExceptionjava.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy44.heartbeat(Unknown Source) at sun.reflect.GeneratedMethodAccessor127.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2990) at com.sun.proxy.$Proxy44.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:622) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:999) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:259) ... 19 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25307) Hive Server 2 crashes when Thrift library encounters particular security protocol issue
Matt McCline created HIVE-25307: --- Summary: Hive Server 2 crashes when Thrift library encounters particular security protocol issue Key: HIVE-25307 URL: https://issues.apache.org/jira/browse/HIVE-25307 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline A RuntimeException is thrown by the Thrift library that causes Hive Server 2 to crash on our customer's machine. If you Google this the exception has been reported a couple of times over the years but not fixed. A blog (see references below) says it is an occasional security protocol issue between Hive Server 2 and a proxy like a Gateway. One challenge is the Thrift TTransportFactory getTransport method declaration throws no Exceptions hence the likely choice of RuntimeException. But that Exception is fatal to Hive Server 2. The proposed fix is a work around that catches RuntimeException in Hive Server 2, saves the Exception cause in a dummy TTransport object, and throws the cause when TTransport's open method is called later. ExceptionClassName: java.lang.RuntimeException ExceptionStackTrace: java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:694) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:691) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:326) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 10 more References: [Hive server 2 thrift error - Cloudera Community - 34293|https://community.cloudera.com/t5/Support-Questions/Hive-server-2-thrift-error/td-p/34293] Eric Lin blog "“NO DATA OR NO SASL DATA IN THE STREAM” ERROR IN HIVESERVER2 LOG" [HIVE-12754] AuthTypes.NONE cause exception after HS2 start - ASF JIRA (apache.org) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant
Matt McCline created HIVE-25237: --- Summary: Thrift CLI Service Protocol: Enhance HTTP variant Key: HIVE-25237 URL: https://issues.apache.org/jira/browse/HIVE-25237 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline I have been thinking about the (Thrift) CLI Service protocol between the client and server. Cloudera's Prashanth Jayachandran (private e-mail) told me that its original BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP transport. HTTP is used when we go through a Gateway. The design for HTTP is stateless and different in nature than the direct BINARY TCP/IP connection. Which means today when we see that a Hive Server 2 response to a HTTP query request can be lost and that is part of the design... It is the WARNING we have seen when the Gateway drops its HTTP connection to Hive Server 2. We had been thinking this was a bug but it is by design. I think the HTTP design needs a rethink. When I worked for Tandem computers a long time ago messages were fault-tolerant. They used a message sequence #. When you send a message to a Tandem server it is a process pair. The message gets routed to the current process called the primary. The primary computes the message work and tells the backup process to remember the results before replying in case there is a failure. You can see where this goes -- if there is a failure before the client gets the result it retries and the backup process can resiliently give back the result the primary sent it. This isn't unique to Tandem -- without a process-pair -- this is a general resilient protocol. In the HTTP design says message lost is possible both directions (request and response). I think we adopt a better scheme but not necessarily a process pair. The first principle of rethink is the +_client_+ needs to generate a new operation num (an integer) that replaces the server-side generated random GUID. And the client generates a new msg num within its new operation. So beeline might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 1. If the client gets an OS connection kind of error, it retries with those (57, 1) numbers. Hive Server 2 will remember the last response. When Hive Server 2 gets a message, there are 3 cases: 1) The sessionId GUID is not valid -- for now we reject the request because it is likely Hive Server 2 killed the session perhaps because it was restarted. 2) The operationNum or operationMsgNum is new. (Assert the msg num increases monotonically.) Perform the request and save the response. And respond. 3) The (operationNum, operationMsgNum) matches the last request. Resiliently respond with the saved result. I think this message handling is in alignment with the HTTP stateless and any messages in-between can be lost philosophy. And it will shield the client from suffering a whole category of message failures that unnecessarily kill queries. This also allows to not worry about which request is idempotent or not but instead requests are resilient. - Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for idempotent and unsent http methods by prasanthj · Pull Request #1983 · apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25228) Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster
Matt McCline created HIVE-25228: --- Summary: Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster Key: HIVE-25228 URL: https://issues.apache.org/jira/browse/HIVE-25228 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline CONSIDER: Have Hive Server 2 monitor operations (queries) for continuing client interest. If a client does not ask for status every 15 seconds, then automatically kill a query and release its txn locks and job resources. Users will experience queries cleaning up much faster (15 to 30 seconds instead of minutes and possibly many minutes) when client communication is lost. Cleaning up those queries prevents other queries from being blocked on EXCLUSIVE txn locks and blocking of scheduling of their queries including retries of the original query. Today, users can get timeouts when they retry a query that got a connection error causing understandably upset users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25227) Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout
Matt McCline created HIVE-25227: --- Summary: Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout Key: HIVE-25227 URL: https://issues.apache.org/jira/browse/HIVE-25227 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline CONSIDER: Avoid proxy (GW) timeouts on long Hive query compiles. Use request to start the operation; then poll for status like we do for execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25196) Native Vectorization of GenericUDFSplit function
Matt McCline created HIVE-25196: --- Summary: Native Vectorization of GenericUDFSplit function Key: HIVE-25196 URL: https://issues.apache.org/jira/browse/HIVE-25196 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline Provide faster 'split' function for vector-mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25191) Modernize Hive Thrift CLI Service Protocol
Matt McCline created HIVE-25191: --- Summary: Modernize Hive Thrift CLI Service Protocol Key: HIVE-25191 URL: https://issues.apache.org/jira/browse/HIVE-25191 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Unnecessary errors are occurring with the advent of proxy use such as Gateways between the Hive client and Hive Server 2. Query failures can be due to arbitrary proxy timeouts. This proposal avoids the timeouts by changing the protocol to do regular polling. Currently, the Hive client uses one request for the query compile request. Long query compile times make those requests vulnerable to the arbitrary proxy timeouts. Another issue is Hive Server 2 sometimes does not notice the client has failed or has lost interest in a potentially long running query. This causes Hive locks and Big Data query resources to be held unnecessarily. The assumption is the client issues a cancel query request when it gets an error. This assumption does not always hold. If the proxy returned an error itself, that proxy may reject the subsequent cancel request, too. And, if the client is killed or the network is down, the client cannot complete a cancel request. The proposed solution here is for Hive Server 2 to watch that the client is sending regular polling requests for status. If a client ceases those requests, then Hive Server 2 will cancel the query. Hive owns the JDBC path (i.e. HiveDriver). The ODBC path may be more challenging because vendors provide ODBC drivers and Hive does not own the ODBC protocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled
Matt McCline created HIVE-25140: --- Summary: Hive Distributed Tracing -- Part 1: Disabled Key: HIVE-25140 URL: https://issues.apache.org/jira/browse/HIVE-25140 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to Thrift and protobuf version conflicts. Has Spans for BeeLine and Hive. Server 2. The code was developed on branch-3.1 and porting Spans to the Hive MetaStore on master is taking more time due to major code refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25069) Hive Distributed Tracing
Matt McCline created HIVE-25069: --- Summary: Hive Distributed Tracing Key: HIVE-25069 URL: https://issues.apache.org/jira/browse/HIVE-25069 Project: Hive Issue Type: New Feature Reporter: Matt McCline Instrument Hive code to gather distributed traces and export trace data to a configurable collector. Distributed tracing is a revolutionary tool for debugging issues. We will use new OpenTelemetry open-source standard that our industry has aligned on. OpenTelemetry is the merger of two earlier distributed tracing projects OpenTracing and OpenCensus. Next step: Add design document that goes into distributed tracing in more detail and describes how Hive will enhanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-20705) Vectorization: Native Vector MapJoin doesn't support Complex Big Table values
Matt McCline created HIVE-20705: --- Summary: Vectorization: Native Vector MapJoin doesn't support Complex Big Table values Key: HIVE-20705 URL: https://issues.apache.org/jira/browse/HIVE-20705 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20645) Vectorization: Implicit casting causes scratch vector reuse Wrong Results
Matt McCline created HIVE-20645: --- Summary: Vectorization: Implicit casting causes scratch vector reuse Wrong Results Key: HIVE-20645 URL: https://issues.apache.org/jira/browse/HIVE-20645 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline The bug fix in HIVE-20563 exposes a Wrong Results bug in vectorized_cast.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20524) Schema Evolution checking is broken in 3.0 for CHAR/VARCHAR
Matt McCline created HIVE-20524: --- Summary: Schema Evolution checking is broken in 3.0 for CHAR/VARCHAR Key: HIVE-20524 URL: https://issues.apache.org/jira/browse/HIVE-20524 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline The new org.apache.hadoop.hive.metastore.ColumnType class under hive version 3 hive-standalone-metadata-server method checkColTypeChangeCompatible lost a version 2 series bug fix that drops CHAR/VARCHAR (and DECIMAL I think) type decorations when checking for Schema Evolution compatibility. Hive1 version 2 did undecoratedTypeName(oldType) and Hive2 version performed the logic in TypeInfoUtils.implicitConvertible on the PrimitiveCategory not the raw type string. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20513) Vectorization: Improve Fast Vector MapJoin Bytes Hash Tables
Matt McCline created HIVE-20513: --- Summary: Vectorization: Improve Fast Vector MapJoin Bytes Hash Tables Key: HIVE-20513 URL: https://issues.apache.org/jira/browse/HIVE-20513 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Based on HIVE-20491 discussions, improve Fast Vector MapJoin Bytes Hash Tables by only storing a one word slot entry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20496) Vectorization: Vectorized PTF IllegalStateException
Matt McCline created HIVE-20496: --- Summary: Vectorization: Vectorized PTF IllegalStateException Key: HIVE-20496 URL: https://issues.apache.org/jira/browse/HIVE-20496 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Testing rebased HIVE-18909 revealed this stack trace: {code} java.lang.IllegalStateException: null at com.google.common.base.Preconditions.checkState(Preconditions.java:159) ~[guava-19.0.jar:?] at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFEvaluatorStreamingDoubleSum.evaluateGroupBatch(VectorPTFEvaluatorStreamingDoubleSum.java:51) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.evaluateStreamingGroupBatch(VectorPTFGroupBatches.java:165) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:380) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:480) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20370) Vectorization: Add Native Vector MapJoin hash table optimization for Left/Right Outer Joins when there are no Small Table values
Matt McCline created HIVE-20370: --- Summary: Vectorization: Add Native Vector MapJoin hash table optimization for Left/Right Outer Joins when there are no Small Table values Key: HIVE-20370 URL: https://issues.apache.org/jira/browse/HIVE-20370 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Similar to Native Vector MapJoin's InnerBigOnly optimization that uses an efficient Hash Multi-Set with a counter instead of a Hash Map with an empty value, do the same for Outer joins. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20367) Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM
Matt McCline created HIVE-20367: --- Summary: Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM Key: HIVE-20367 URL: https://issues.apache.org/jira/browse/HIVE-20367 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Add support for vectorizing PTF AVG, MAX, MIN, SUM when: {noformat} ROWS PRECEDING(MAX)~CURRENT {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20352) Vectorization: Support grouping function
Matt McCline created HIVE-20352: --- Summary: Vectorization: Support grouping function Key: HIVE-20352 URL: https://issues.apache.org/jira/browse/HIVE-20352 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Support native vectorization for grouping function (part of Grouping Sets) so we don't need to use VectorUDFAdaptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20339) Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized
Matt McCline created HIVE-20339: --- Summary: Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized Key: HIVE-20339 URL: https://issues.apache.org/jira/browse/HIVE-20339 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Unnecessary: "PTF operator: More than 1 argument expression of aggregation function rank" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20328) Reenable: TestMiniDruidCliDriver
Matt McCline created HIVE-20328: --- Summary: Reenable: TestMiniDruidCliDriver Key: HIVE-20328 URL: https://issues.apache.org/jira/browse/HIVE-20328 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: slim bouguerra Reenable tests disabled in HIVE-20322. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20325) FlakyTest: TestMiniDruidCliDriver
Matt McCline created HIVE-20325: --- Summary: FlakyTest: TestMiniDruidCliDriver Key: HIVE-20325 URL: https://issues.apache.org/jira/browse/HIVE-20325 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline TestMiniDruidCliDriver is failing intermittently a significant percentage of the time. druid_timestamptz druidmini_joins druidmini_masking druidmini_test1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20315) Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions
Matt McCline created HIVE-20315: --- Summary: Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions Key: HIVE-20315 URL: https://issues.apache.org/jira/browse/HIVE-20315 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Generate multi-byte Unicode characters in addition to regular single byte characters for random data. Don't CAST from STRING/VARCHAR/CHAR TO STRING since all are stored in vectorization without padding. Fix vectorized BETWEEN expression work to avoid unnecessary CAST of DECIMAL constants. Fix NULL / Wrong Results issues in VectorElt. Change performance Q files to generate non-user EXPLAIN with VECTORIZATION display so unnecesary CAST / DECIMAL_64 conversions are visible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20294) Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT
Matt McCline created HIVE-20294: --- Summary: Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT Key: HIVE-20294 URL: https://issues.apache.org/jira/browse/HIVE-20294 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized COALESCE and ELT. Also, add tests for ARRAY and MAP indexing, IS [NOT] NULL and NOT -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20245) Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN
Matt McCline created HIVE-20245: --- Summary: Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN Key: HIVE-20245 URL: https://issues.apache.org/jira/browse/HIVE-20245 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized BETWEEN and IN. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20207) Vectorization: Fix NULL / Wrong Results issues in Filter / Compare
Matt McCline created HIVE-20207: --- Summary: Vectorization: Fix NULL / Wrong Results issues in Filter / Compare Key: HIVE-20207 URL: https://issues.apache.org/jira/browse/HIVE-20207 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized filter and compare. BUGS: 1) LongColLessLongColumn SIMD optimization do not work for very large integers: -7272907770454997143 < 8976171455044006767 outputVector[i] = (vector1[i] - vector2[i]) >>> 63; Produces 0 instead of 1... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20197) Vectorization: Add DECIMAL_64 testing, add Date/Interval/Timestamp arithmetic, and fix more NULL / Wrong Results issues in GROUP BY Aggregation Functions
Matt McCline created HIVE-20197: --- Summary: Vectorization: Add DECIMAL_64 testing, add Date/Interval/Timestamp arithmetic, and fix more NULL / Wrong Results issues in GROUP BY Aggregation Functions Key: HIVE-20197 URL: https://issues.apache.org/jira/browse/HIVE-20197 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Add DECIMAL_64 testing to TestVectorArithmetic and TestVectorAggregation. And, add a few more aggregation tests to TestVectorAggregation. Add + and - Date/Interval/Timestamp arithmetic tests to TestVectorArithmetic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20174) Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions
Matt McCline created HIVE-20174: --- Summary: Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions Key: HIVE-20174 URL: https://issues.apache.org/jira/browse/HIVE-20174 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized aggregation functions: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20091) Tez: Add security credentials for FileSinkOperator output
Matt McCline created HIVE-20091: --- Summary: Tez: Add security credentials for FileSinkOperator output Key: HIVE-20091 URL: https://issues.apache.org/jira/browse/HIVE-20091 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline DagUtils needs to add security credentials for the output for the FileSinkOperator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19992) Vectorization: Follow-on to HIVE-19951 --> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not
Matt McCline created HIVE-19992: --- Summary: Vectorization: Follow-on to HIVE-19951 --> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not implicit Key: HIVE-19992 URL: https://issues.apache.org/jira/browse/HIVE-19992 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline When ORC-380 that adds the SchemaEvolution.isOnlyImplicitConversion call is available in the ORC release used by Apache master (and branch-3), then update LlapRecordReader (see comments in HIVE-19951 change). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19951) Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution)
Matt McCline created HIVE-19951: --- Summary: Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution) Key: HIVE-19951 URL: https://issues.apache.org/jira/browse/HIVE-19951 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Currently, reading encoded ORC data does not support data type conversion. So, encoded reading and cache populating needs to be disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19929) Vectorization: Recheck for vectorization wrong results/execution failures
Matt McCline created HIVE-19929: --- Summary: Vectorization: Recheck for vectorization wrong results/execution failures Key: HIVE-19929 URL: https://issues.apache.org/jira/browse/HIVE-19929 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Use test variables hive.test.vectorized.execution.enabled.override=enable and hive.test.vectorization.suppress.explain.execution.mode=true to look for wrong results/execution failures when vectorization is forced ON and "Execution mode: vectorized" is suppressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19566) Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions
Matt McCline created HIVE-19566: --- Summary: Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions Key: HIVE-19566 URL: https://issues.apache.org/jira/browse/HIVE-19566 Project: Hive Issue Type: Bug Reporter: Matt McCline Fix For: 3.1.0 Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized Complex Type functions: * index * (StructField) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19565) Vectorization: Fix NULL / Wrong Results issues in STRING Functions
Matt McCline created HIVE-19565: --- Summary: Vectorization: Fix NULL / Wrong Results issues in STRING Functions Key: HIVE-19565 URL: https://issues.apache.org/jira/browse/HIVE-19565 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized STRING functions: * char_length * concat * initcap * length * lower * ltrim * octet_length * regexp * rtrim * trim * upper * UDF: ** hex ** like ** substr -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19564) Vectorization: Fix NULL / Wrong Results issues in Functions
Matt McCline created HIVE-19564: --- Summary: Vectorization: Fix NULL / Wrong Results issues in Functions Key: HIVE-19564 URL: https://issues.apache.org/jira/browse/HIVE-19564 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized functions: * Generic UDF Functions ** abs ** bround ** ceiling ** floor ** pmod ** power ** round * UDF Functions ** Acos ** Asin ** Atan ** Bin ** Cos ** Degrees ** Exp ** Ln ** Log ** log10 ** log2 ** radians ** rand ** sign ** sin ** sqrt ** tan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19530) Vectorization: Fix JDBCSerde and re-enable vectorization
Matt McCline created HIVE-19530: --- Summary: Vectorization: Fix JDBCSerde and re-enable vectorization Key: HIVE-19530 URL: https://issues.apache.org/jira/browse/HIVE-19530 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline According to [~jcamachorodriguez] there is a big switch statement in the code that has might have missing types. This can lead to the string types seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19529) Vectorization: Date/Timestamp NULL issues
Matt McCline created HIVE-19529: --- Summary: Vectorization: Date/Timestamp NULL issues Key: HIVE-19529 URL: https://issues.apache.org/jira/browse/HIVE-19529 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline date_add/date_sub more TBD -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19498) Vectorization: CAST expressions produce wrong results
Matt McCline created HIVE-19498: --- Summary: Vectorization: CAST expressions produce wrong results Key: HIVE-19498 URL: https://issues.apache.org/jira/browse/HIVE-19498 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.1.0 DATE --> BOOLEAN DOUBLE --> DECIMAL STRING|CHAR|VARCHAR --> DECIMAL TIMESTAMP --> LONG -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19448) Vectorization: sysdb test doesn't work after enabling vectorization by default
Matt McCline created HIVE-19448: --- Summary: Vectorization: sysdb test doesn't work after enabling vectorization by default Key: HIVE-19448 URL: https://issues.apache.org/jira/browse/HIVE-19448 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline {noformat} Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Boolean at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaBooleanObjectInspector.getPrimitiveWritableObject(JavaBooleanObjectInspector.java:36) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:434) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:347) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:948){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19384) Vectorization: IfExprTimestampColumnScalarBase doesn't handle the arg1ColVector.noNulls case correctly
Matt McCline created HIVE-19384: --- Summary: Vectorization: IfExprTimestampColumnScalarBase doesn't handle the arg1ColVector.noNulls case correctly Key: HIVE-19384 URL: https://issues.apache.org/jira/browse/HIVE-19384 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline It is missing boilerplate code from HIVE-18622: "Vectorization: IF Statements, Comparisons, and more do not handle NULLs correctly" fix. {noformat} // Carefully handle NULLs... outputColVector.noNulls = false;{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19353) Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST
Matt McCline created HIVE-19353: --- Summary: Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST Key: HIVE-19353 URL: https://issues.apache.org/jira/browse/HIVE-19353 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Found by enabling vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData {noformat} Caused by: java.lang.RuntimeException: Unexpected column vector type LIST at org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:237) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:984) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:722) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19352) Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
Matt McCline created HIVE-19352: --- Summary: Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData Key: HIVE-19352 URL: https://issues.apache.org/jira/browse/HIVE-19352 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Turning vectorization on triggers a bug - see Jira . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19351) Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1
Matt McCline created HIVE-19351: --- Summary: Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1 Key: HIVE-19351 URL: https://issues.apache.org/jira/browse/HIVE-19351 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Why were the operator numbers unstable for: TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] when vectorization was enabled? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19350) Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1
Matt McCline created HIVE-19350: --- Summary: Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1 Key: HIVE-19350 URL: https://issues.apache.org/jira/browse/HIVE-19350 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Seem like the operator number instability issue to me that Pengcheng Xiong that could occur with vectorization. For now, turning off vectorization for: TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] Follow up Jira is -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19275) Vectorization: Wrong Results / Execution Failures when Vectorization turned on in Spark
Matt McCline created HIVE-19275: --- Summary: Vectorization: Wrong Results / Execution Failures when Vectorization turned on in Spark Key: HIVE-19275 URL: https://issues.apache.org/jira/browse/HIVE-19275 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0, 3.1.0 Quite a number of the bucket* tests had Wrong Results or Execution Failures. And others like semijoin, skewjoin, avro_decimal_native, mapjoin_addjar, mapjoin_decimal, nullgroup, decimal_join, mapjoin1. Some of the problems might be as simple as "-- SORT_QUERY_RESULTS" is missing. The bucket* problems looked more serious. This change sets "hive.vectorized.execution.enabled" to false at the top of those Q files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19269) Vectorization: Turn On by Default
Matt McCline created HIVE-19269: --- Summary: Vectorization: Turn On by Default Key: HIVE-19269 URL: https://issues.apache.org/jira/browse/HIVE-19269 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0, 3.1.0 Reflect that our most expected Hive deployment will be using vectorization and change the default of hive.vectorized.execution.enabled to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19264) Vectorization: Reenable vectorization in vector_adaptor_usage_mode.q
Matt McCline created HIVE-19264: --- Summary: Vectorization: Reenable vectorization in vector_adaptor_usage_mode.q Key: HIVE-19264 URL: https://issues.apache.org/jira/browse/HIVE-19264 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0, 3.1.0 [~vihangk1] observed vectorization had accidentally been turned off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19200) Vectorization: Disable vectorization for LLAP I/O when a non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type conversion is needed
Matt McCline created HIVE-19200: --- Summary: Vectorization: Disable vectorization for LLAP I/O when a non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type conversion is needed Key: HIVE-19200 URL: https://issues.apache.org/jira/browse/HIVE-19200 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Disable vectorization for issue in HIVE-18763 until we can do the harder VRB conversion code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19167) Map data type doesn't keep the order of the key/values pairs as read (Part 2, The Sequel or SQL)
Matt McCline created HIVE-19167: --- Summary: Map data type doesn't keep the order of the key/values pairs as read (Part 2, The Sequel or SQL) Key: HIVE-19167 URL: https://issues.apache.org/jira/browse/HIVE-19167 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.1.0 HIVE-19116: "Vectorization: Vector Map data type doesn't keep the order of the key/values pairs as read" didn't fix all the places where HashMap is used instead of LinkedHashMap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19118) Vectorization: Turning on vectorization in escape_crlf produces wrong results
Matt McCline created HIVE-19118: --- Summary: Vectorization: Turning on vectorization in escape_crlf produces wrong results Key: HIVE-19118 URL: https://issues.apache.org/jira/browse/HIVE-19118 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19116) Vectorization: Vector Map data type doesn't keep the order of the key/values pairs as read
Matt McCline created HIVE-19116: --- Summary: Vectorization: Vector Map data type doesn't keep the order of the key/values pairs as read Key: HIVE-19116 URL: https://issues.apache.org/jira/browse/HIVE-19116 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline The VectorExtractRow class does not preserve the order of the key/value pairs when going from MapColumnVector to a Map object. This causes Q file differences in tests with the MAP data type making it seem like we are getting Wrong Results (well, actually we are). When LazyMap class (for example) adds key/value pairs to its "map" it uses a LinkedHashSet to preserve insert order. FYI: [~teddy.choi] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19110) Vectorization: Enabling vectorization causes TestContribCliDriver udf_example_arraymapstruct.q to produce Wrong Results
Matt McCline created HIVE-19110: --- Summary: Vectorization: Enabling vectorization causes TestContribCliDriver udf_example_arraymapstruct.q to produce Wrong Results Key: HIVE-19110 URL: https://issues.apache.org/jira/browse/HIVE-19110 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19109) Vectorization: Enabling vectorization causes delete_orig_table to produce Wrong Results
Matt McCline created HIVE-19109: --- Summary: Vectorization: Enabling vectorization causes delete_orig_table to produce Wrong Results Key: HIVE-19109 URL: https://issues.apache.org/jira/browse/HIVE-19109 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19108) Vectorization and Parquet: Turning on vectorization in parquet_ppd_decimal.q causes Wrong Query Results
Matt McCline created HIVE-19108: --- Summary: Vectorization and Parquet: Turning on vectorization in parquet_ppd_decimal.q causes Wrong Query Results Key: HIVE-19108 URL: https://issues.apache.org/jira/browse/HIVE-19108 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19102) Vectorization: Suppress known Q file bugs
Matt McCline created HIVE-19102: --- Summary: Vectorization: Suppress known Q file bugs Key: HIVE-19102 URL: https://issues.apache.org/jira/browse/HIVE-19102 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline There are known bugs recently found and reported that occur when vectorization is turn on in Q files. Until those bugs are fixed, add SET statements to the top of the Q files that suppress vectorization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19088) Vectorization: Turning on vectorization in input_lazyserde.q causes ClassCastException
Matt McCline created HIVE-19088: --- Summary: Vectorization: Turning on vectorization in input_lazyserde.q causes ClassCastException Key: HIVE-19088 URL: https://issues.apache.org/jira/browse/HIVE-19088 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline {noformat} 2018-03-31T21:19:48,252 ERROR [LocalJobRunner Map Task Executor #0] mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:967) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.StandardUnionObjectInspector$StandardUnion at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:608) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRow(VectorAssignRow.java:998) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:956) ... 11 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19074) Vectorization: Add llap vectorization_div0.q.out Q output file
Matt McCline created HIVE-19074: --- Summary: Vectorization: Add llap vectorization_div0.q.out Q output file Key: HIVE-19074 URL: https://issues.apache.org/jira/browse/HIVE-19074 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline At some point llap/vectorization_div0.q.out got omitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19052) Vectorization: Disable Vector Pass-Thru MapJoin in the presence of old-style MR FilterMaps
Matt McCline created HIVE-19052: --- Summary: Vectorization: Disable Vector Pass-Thru MapJoin in the presence of old-style MR FilterMaps Key: HIVE-19052 URL: https://issues.apache.org/jira/browse/HIVE-19052 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Pass-Thru VectorMapJoinOperator and VectorSMBMapJoinOperator were not designed to handle old-style MR FilterMaps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19045) Vectorization: Disable vectorization in non-vectorized Parquet Q files
Matt McCline created HIVE-19045: --- Summary: Vectorization: Disable vectorization in non-vectorized Parquet Q files Key: HIVE-19045 URL: https://issues.apache.org/jira/browse/HIVE-19045 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline In preparation for turning vectorization on by default, explicitly turn off vectorization at the top of the Parquet Q files since there are a separate set of Parquet Vectorization Q files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19043) Vectorization: LazySimpleDeserializeRead fewer fields handling broken for Complex Types
Matt McCline created HIVE-19043: --- Summary: Vectorization: LazySimpleDeserializeRead fewer fields handling broken for Complex Types Key: HIVE-19043 URL: https://issues.apache.org/jira/browse/HIVE-19043 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Issues were revealed by vectorizing create_struct_table.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19037) Vectorization: Miscellaneous cleanup
Matt McCline created HIVE-19037: --- Summary: Vectorization: Miscellaneous cleanup Key: HIVE-19037 URL: https://issues.apache.org/jira/browse/HIVE-19037 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline # Extraneous INFO logging in VectorReduceSinkCommonOperator # NPE in EXPLAIN for some SelectColumnIsTrue vector expressions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19035) Vectorization: Disable exotic field reference form
Matt McCline created HIVE-19035: --- Summary: Vectorization: Disable exotic field reference form Key: HIVE-19035 URL: https://issues.apache.org/jira/browse/HIVE-19035 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline We currently don't support exotic field references like get a struct field from array> returns a type array. Attempt causes ClassCastException in VectorizationContext that kills query planning. The Q file is input_testxpath3.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19032) Vectorization: Disable GROUP BY aggregations with DISTINCT
Matt McCline created HIVE-19032: --- Summary: Vectorization: Disable GROUP BY aggregations with DISTINCT Key: HIVE-19032 URL: https://issues.apache.org/jira/browse/HIVE-19032 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Vectorized GROUP BY does not support DISTINCT aggregation functions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19024) Vectorization: Disable complex type constants
Matt McCline created HIVE-19024: --- Summary: Vectorization: Disable complex type constants Key: HIVE-19024 URL: https://issues.apache.org/jira/browse/HIVE-19024 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Currently, complex type constants are not detected and cause execution failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19020) Vectorization: When vectorized, orc_null_check.q throws NPE in VectorExpressionWriterFactory
Matt McCline created HIVE-19020: --- Summary: Vectorization: When vectorized, orc_null_check.q throws NPE in VectorExpressionWriterFactory Key: HIVE-19020 URL: https://issues.apache.org/jira/browse/HIVE-19020 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to orc_null_check.q triggers this call stack: {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$18.setValue(VectorExpressionWriterFactory.java:1465) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$18.writeValue(VectorExpressionWriterFactory.java:1453) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:199) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:151) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:813) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:846) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?] at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19019) Vectorization and Parquet: When vectorized, parquet_schema_evolution.q throws HiveException "Not implemented yet"
Matt McCline created HIVE-19019: --- Summary: Vectorization and Parquet: When vectorized, parquet_schema_evolution.q throws HiveException "Not implemented yet" Key: HIVE-19019 URL: https://issues.apache.org/jira/browse/HIVE-19019 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to parquet_schema_evolution.q triggers this call stack: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Not implemented yet at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$19.writeValue(VectorExpressionWriterFactory.java:1496) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:199) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:151) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.flushDeserializerBatch(VectorMapOperator.java:630) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setupPartitionContextVars(VectorMapOperator.java:698) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:607) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1210) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:829) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?] {noformat} The complex types in VectorExpressionWriterFactory are not fully implemented. FYI: [~vihangk1] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19016) Vectorization and Parquet: When vectorized, parquet_nested_complex.q produces RuntimeException: Unsupported type used
Matt McCline created HIVE-19016: --- Summary: Vectorization and Parquet: When vectorized, parquet_nested_complex.q produces RuntimeException: Unsupported type used Key: HIVE-19016 URL: https://issues.apache.org/jira/browse/HIVE-19016 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to parquet_nested_complex.q triggers this call stack: {noformat} Caused by: java.lang.RuntimeException: Unsupported type used in list:array> at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkListColumnSupport(VectorizedParquetRecordReader.java:589) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:525) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {noformat} FYI: [~vihangk1] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
Matt McCline created HIVE-19015: --- Summary: Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException Key: HIVE-19015 URL: https://issues.apache.org/jira/browse/HIVE-19015 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to parquet_map_of_arrays_of_ints.q triggers this call stack: {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {noformat} FYI: [~vihangk1] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18995) Vectorization: Add option to suppress "Execution mode: vectorized" for testing purposes
Matt McCline created HIVE-18995: --- Summary: Vectorization: Add option to suppress "Execution mode: vectorized" for testing purposes Key: HIVE-18995 URL: https://issues.apache.org/jira/browse/HIVE-18995 Project: Hive Issue Type: Improvement Components: Hive Reporter: Matt McCline Assignee: Matt McCline In order to see Q file differences in large runs it is helpful to eliminate change noise from "Execution mode: vectorized" in EXPLAIN output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18908) Add support for FULL OUTER JOIN to MapJoin
Matt McCline created HIVE-18908: --- Summary: Add support for FULL OUTER JOIN to MapJoin Key: HIVE-18908 URL: https://issues.apache.org/jira/browse/HIVE-18908 Project: Hive Issue Type: Improvement Components: Hive Reporter: Matt McCline Assignee: Matt McCline Currently, we do not support FULL OUTER JOIN in MapJoin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18819) Vectorization: Optimize IF statement expression evaluation of THEN/ELSE
Matt McCline created HIVE-18819: --- Summary: Vectorization: Optimize IF statement expression evaluation of THEN/ELSE Key: HIVE-18819 URL: https://issues.apache.org/jira/browse/HIVE-18819 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Currently, all the rows of a batch are evaluated for the THEN and ELSE expressions even though only a value from one of them is needed for any particular row. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18807) Fix broken test caused by HIVE-18493
Matt McCline created HIVE-18807: --- Summary: Fix broken test caused by HIVE-18493 Key: HIVE-18807 URL: https://issues.apache.org/jira/browse/HIVE-18807 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18806) Add @Ignore for broken test caused by HIVE-18493
Matt McCline created HIVE-18806: --- Summary: Add @Ignore for broken test caused by HIVE-18493 Key: HIVE-18806 URL: https://issues.apache.org/jira/browse/HIVE-18806 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18800) Vectorization: VectorCoalesce doesn't handle the all repeated NULLs case
Matt McCline created HIVE-18800: --- Summary: Vectorization: VectorCoalesce doesn't handle the all repeated NULLs case Key: HIVE-18800 URL: https://issues.apache.org/jira/browse/HIVE-18800 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix for HIVE-18622 broken the case when all columns are repeated NULLs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18758) Vectorization: Fix VectorUDAFVarFinal produces Wrong Results
Matt McCline created HIVE-18758: --- Summary: Vectorization: Fix VectorUDAFVarFinal produces Wrong Results Key: HIVE-18758 URL: https://issues.apache.org/jira/browse/HIVE-18758 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix and turn back on vectorization for issue found in https://issues.apache.org/jira/browse/HIVE-18756 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18756) Vectorization: VectorUDAFVarFinal produces Wrong Results
Matt McCline created HIVE-18756: --- Summary: Vectorization: VectorUDAFVarFinal produces Wrong Results Key: HIVE-18756 URL: https://issues.apache.org/jira/browse/HIVE-18756 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline For a large query. Disabling vectorization for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18744) Vectorization: VectorHashKeyWrapperBatch doesn't check repeated NULLs correctly
Matt McCline created HIVE-18744: --- Summary: Vectorization: VectorHashKeyWrapperBatch doesn't check repeated NULLs correctly Key: HIVE-18744 URL: https://issues.apache.org/jira/browse/HIVE-18744 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Logic for checking selectedInUse isRepeating case for NULL is broken. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18722) Vectorization: Adding SUM(HASH(..)) to full query seems to produce flakey results -- need to investiage
Matt McCline created HIVE-18722: --- Summary: Vectorization: Adding SUM(HASH(..)) to full query seems to produce flakey results -- need to investiage Key: HIVE-18722 URL: https://issues.apache.org/jira/browse/HIVE-18722 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline When added to HIVE-18622 changes, the query results vary from laptop results when run on Hive QA cluster. Need to investigate after HIVE-18622 commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18622) Vectorization: IF statement, Comparisons, and more do not handle NULLs correctly
Matt McCline created HIVE-18622: --- Summary: Vectorization: IF statement, Comparisons, and more do not handle NULLs correctly Key: HIVE-18622 URL: https://issues.apache.org/jira/browse/HIVE-18622 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 Many vector expression classes are missing guards around setting noNulls among other things. {code:java} // Carefully update noNulls... if (outputColVector.noNulls) { outputColVector.noNulls = inputColVector.noNulls; } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18600) Vectorization: Top-Level Vector Expression Scratch Column Deallocation
Matt McCline created HIVE-18600: --- Summary: Vectorization: Top-Level Vector Expression Scratch Column Deallocation Key: HIVE-18600 URL: https://issues.apache.org/jira/browse/HIVE-18600 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 The operators create various vector expression *arrays* for predicates, SELECT clauses, key expressions, etc. We could have those be marked as special "top level" vector expression then we could defer deallocation until the top level expression is complete. This could be a simple solution that avoids trying fix our current eager deallocation that tries to reuse scratch columns as soon as possible. It *isn't optimal*, but it *shouldn't be too bad*. This solution is much better than not deallocating at all - especially for queries that SELECT a large number of columns or have a lot of expressions in the operator tree. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18562) Vectorization: CHAR/VARCHAR conversion in VectorDeserializeRow is broken
Matt McCline created HIVE-18562: --- Summary: Vectorization: CHAR/VARCHAR conversion in VectorDeserializeRow is broken Key: HIVE-18562 URL: https://issues.apache.org/jira/browse/HIVE-18562 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 Altering a CHAR/VARCHAR column's maxLength to a shorter value does not truncate values when vectorized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18561) Vectorization: Current vector PTF doesn't work under GroupBy and is designed for reduce-shuffle input
Matt McCline created HIVE-18561: --- Summary: Vectorization: Current vector PTF doesn't work under GroupBy and is designed for reduce-shuffle input Key: HIVE-18561 URL: https://issues.apache.org/jira/browse/HIVE-18561 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Need to add validation check in Vectorizer that doesn't vectorize unless PTF is under reduce-shuffle (with optional SELECT in-between). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18551) Vectorization: VectorMapOperator tries to write too many vector columns for Hybrid Grace
Matt McCline created HIVE-18551: --- Summary: Vectorization: VectorMapOperator tries to write too many vector columns for Hybrid Grace Key: HIVE-18551 URL: https://issues.apache.org/jira/browse/HIVE-18551 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 Code incorrectly uses projectedColumns.length instead of singleRow.length -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18531) Vectorization: Vectorized PTF operator should not set the initial type infos
Matt McCline created HIVE-18531: --- Summary: Vectorization: Vectorized PTF operator should not set the initial type infos Key: HIVE-18531 URL: https://issues.apache.org/jira/browse/HIVE-18531 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline The Vectorized PTF operator is mistakenly setting the initial type infos for its output VectorizationContext. It should not. It is only creating a projection of the initial columns from ReduceSink (i.e. keys, values) plus scratch columns for output columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18524) Vectorization: Execution failure related to non-standard embedding of IfExprConditionalFilter inside VectorUDFAdaptor (HIVE-17139)
Matt McCline created HIVE-18524: --- Summary: Vectorization: Execution failure related to non-standard embedding of IfExprConditionalFilter inside VectorUDFAdaptor (HIVE-17139) Key: HIVE-18524 URL: https://issues.apache.org/jira/browse/HIVE-18524 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline {nocode} insert overwrite table insert_10_1 select cast(gpa as float), age, IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL), IF(LENGTH(name)>10,cast(name as binary),NULL) from studentnull10k vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double] ExprNodeDescs: UDFToFloat(gpa) (type: float), age (type: int), if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp), if((length(name) > 10), CAST( name AS BINARY), null) (type: binary) selectExpressions: VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null)) (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean) -> 5:timestamp, VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null)) (children: LongColGreaterLongScalar(col 4:int, val 10)(children: StringLength(col 0:string) -> 4:int) -> 6:boolean, VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary {nocode} *// Notice there is no vector expression shown for the last IF stmt.* It has been magically embedded inside the VectorUDFAdaptor object... Execution results in this call stack. {nocode} Caused by: java.lang.NullPointerException at java.util.Arrays.copyOfRange(Arrays.java:3521) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145) ... 22 more {nocode} Change is due to: HIVE-17139: Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine. (Jia Ke, reviewed by Ferdinand Xu) Embedding a raw vector expression outside of VectorizationContext is quite non-standard and evidently buggy. [~Ferd] [~Ke Jia] I am inclined to revert this change. Comments? CC: [~ashutoshc] [~hagleitn] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18521) Vectorization: query failing in reducer VectorUDAFAvgDecimalPartial2 java.lang.ClassCastException StructTypeInfo --> DecimalTypeInfo
Matt McCline created HIVE-18521: --- Summary: Vectorization: query failing in reducer VectorUDAFAvgDecimalPartial2 java.lang.ClassCastException StructTypeInfo --> DecimalTypeInfo Key: HIVE-18521 URL: https://issues.apache.org/jira/browse/HIVE-18521 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18517) Vectorization: Fix VectorMapOperator to accept VRBs to support LLAP Caching
Matt McCline created HIVE-18517: --- Summary: Vectorization: Fix VectorMapOperator to accept VRBs to support LLAP Caching Key: HIVE-18517 URL: https://issues.apache.org/jira/browse/HIVE-18517 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline LLAP is able to deserialize and cache data from an input format (e.g. TextInputFormat) and will deliver that cached data to VectorMapOperator as VRBs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18493) Add display escape for CR/LF to Hive CLI and Beeline
Matt McCline created HIVE-18493: --- Summary: Add display escape for CR/LF to Hive CLI and Beeline Key: HIVE-18493 URL: https://issues.apache.org/jira/browse/HIVE-18493 Project: Hive Issue Type: Bug Components: Beeline, Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Add optional display escaping of carriage return and line feed so row output remains one line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18466) Enhance LazySimpleSerDe (Text) to optionally output data escaped for serialization
Matt McCline created HIVE-18466: --- Summary: Enhance LazySimpleSerDe (Text) to optionally output data escaped for serialization Key: HIVE-18466 URL: https://issues.apache.org/jira/browse/HIVE-18466 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Environment: Allows SELECTing out TEXTFILE columns but retaining STRING data type family escapes so the output is still TEXTFILE. Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18258) Vectorization: Reduce-Side GROUP BY MERGEPARTIAL with duplicate columns is broken
Matt McCline created HIVE-18258: --- Summary: Vectorization: Reduce-Side GROUP BY MERGEPARTIAL with duplicate columns is broken Key: HIVE-18258 URL: https://issues.apache.org/jira/browse/HIVE-18258 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 3.0.0 See Q file. Duplicate columns in key are not handled correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18191) Vectorization: When text input format is vectorized, TableScanOperator needs to not try to gather statistics
Matt McCline created HIVE-18191: --- Summary: Vectorization: When text input format is vectorized, TableScanOperator needs to not try to gather statistics Key: HIVE-18191 URL: https://issues.apache.org/jira/browse/HIVE-18191 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical That is, to not an try to use row-mode gatherStats method... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18146) Vectorization: VectorMapJoinOperator Decimal64ColumnVector key/value cast bug
Matt McCline created HIVE-18146: --- Summary: Vectorization: VectorMapJoinOperator Decimal64ColumnVector key/value cast bug Key: HIVE-18146 URL: https://issues.apache.org/jira/browse/HIVE-18146 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 3.0.0 Need to convert automatically convert Decimal64ColumnVector key/value expressions to DecimalColumnVector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18077) Vectorization: Add string conversion case for UDFToDouble
Matt McCline created HIVE-18077: --- Summary: Vectorization: Add string conversion case for UDFToDouble Key: HIVE-18077 URL: https://issues.apache.org/jira/browse/HIVE-18077 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 3.0.0 Add string to float/double vectorization. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17895) Vectorization: Wrong results for schema_evol_text_vec_table.q (LLAP)
Matt McCline created HIVE-17895: --- Summary: Vectorization: Wrong results for schema_evol_text_vec_table.q (LLAP) Key: HIVE-17895 URL: https://issues.apache.org/jira/browse/HIVE-17895 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical NonVec: 103 NULL0.0 NULLoriginal Vec: 103NULLNULLNULLoriginal -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17894) Vectorization: Wrong results for dynpart_sort_opt_vectorization.q (LLAP)
Matt McCline created HIVE-17894: --- Summary: Vectorization: Wrong results for dynpart_sort_opt_vectorization.q (LLAP) Key: HIVE-17894 URL: https://issues.apache.org/jira/browse/HIVE-17894 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical NonVec: 34 Vec: 38 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17893) Vectorization: Wrong results for vector_udf3.q
Matt McCline created HIVE-17893: --- Summary: Vectorization: Wrong results for vector_udf3.q Key: HIVE-17893 URL: https://issues.apache.org/jira/browse/HIVE-17893 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical NonVec: yy2GiGM ll2TvTZ yxN0212hM17E8J8bJj8D7b lkA0212uZ17R8W8oWw8Q7o ywA68u76Jv06axCv451avL4 ljN68h76Wi06nkPi451niY4 yvNv1q liAi1d yv3gnG4a33hD7bIm7oxE5rw li3taT4n33uQ7oVz7bkR5ej yv1js li1wf yujO07KWj lhwB07XJw ytpx1RL8F2I lgck1EY8S2V ytj7g5W lgw7t5J ytgaJW1Gvrkv5wFUJU2y1S lgtnWJ1Tiexi5jSHWH2l1F Vec: yy2GiGM Unvectorized yxN0212hM17E8J8bJj8D7b Unvectorized ywA68u76Jv06axCv451avL4 Unvectorized yvNv1q Unvectorized yv3gnG4a33hD7bIm7oxE5rw Unvectorized yv1js Unvectorized yujO07KWj Unvectorized ytpx1RL8F2I Unvectorized ytj7g5W Unvectorized ytgaJW1Gvrkv5wFUJU2y1S Unvectorized -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17892) Vectorization: Wrong results for vectorized_timestamp_funcs.q
Matt McCline created HIVE-17892: --- Summary: Vectorization: Wrong results for vectorized_timestamp_funcs.q Key: HIVE-17892 URL: https://issues.apache.org/jira/browse/HIVE-17892 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical NonVec: NULLNULLNULLNULLNULLNULLNULLNULLNULL NULLNULLNULLNULLNULLNULLNULLNULLNULL NULLNULLNULLNULLNULLNULLNULLNULLNULL Vec: NULLNULLNULLNULLNULLNULL8 1 1 NULLNULLNULLNULLNULLNULLNULLNULLNULL -621697655612 11 30 30 48 4 40 39 -- This message was sent by Atlassian JIRA (v6.4.14#64029)