[jira] [Created] (HIVE-25865) ALTER RENAME suppresses commitTransaction failure and reports operation success
Matt McCline created HIVE-25865: --- Summary: ALTER RENAME suppresses commitTransaction failure and reports operation success Key: HIVE-25865 URL: https://issues.apache.org/jira/browse/HIVE-25865 Project: Hive Issue Type: Bug Components: Metastore Reporter: Matt McCline Assignee: Matt McCline If the Commit Tx fails, HiveAlterHandler,alterTable does not report an error. It suppresses the issue and returns successfully. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25493) TBLPROPERTIES upper- vs. lower-case confusion
Matt McCline created HIVE-25493: --- Summary: TBLPROPERTIES upper- vs. lower-case confusion Key: HIVE-25493 URL: https://issues.apache.org/jira/browse/HIVE-25493 Project: Hive Issue Type: Bug Affects Versions: 3.1.2 Reporter: Matt McCline Assignee: Matt McCline User confused by ALTER TABLE SET PROPERTIES difference between 'EXTERNAL'='FALSE' (ignored adds 2 properties EXTERNAL and FALSE) and 'external'='false' (transaction error). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25478) Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS
Matt McCline created HIVE-25478: --- Summary: Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS Key: HIVE-25478 URL: https://issues.apache.org/jira/browse/HIVE-25478 Project: Hive Issue Type: Bug Affects Versions: 3.1.0 Reporter: Matt McCline Assignee: Matt McCline The dot staging file (".hive-staging") file is not removed at the end of the ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS operation as it is for say an INSERT that does automatic statistics collection. I expected it would be deleted after the Stats Work stage. Any ideas where in the code to add automatic deletion (hook)? hdfs dfs -ls /hive/warehouse/managed/table_orc Found 2 items drwxr-xr-x - hive supergroup 0 2021-08-24 17:19 /hive/warehouse/managed/table_orc/.hive-staging_hive_2021-08-24_17-19-17_228_4856027533912221506-7 drwxr-xr-x - hive supergroup 0 2021-08-24 07:17 /hive/warehouse/managed/table_orc/delta_001_001_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two
Matt McCline created HIVE-25446: --- Summary: VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two Key: HIVE-25446 URL: https://issues.apache.org/jira/browse/HIVE-25446 Project: Hive Issue Type: Bug Environment: Encountered this in a very large query: Caused by: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113) at java.util.concurrent.FutureTask.run(FutureTask.java:266) Reporter: Matt McCline Assignee: Matt McCline Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25396) Improve uncaught Thread Exception handling in Hive Server 2
Matt McCline created HIVE-25396: --- Summary: Improve uncaught Thread Exception handling in Hive Server 2 Key: HIVE-25396 URL: https://issues.apache.org/jira/browse/HIVE-25396 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Hive's org.apache.hive.service.thrift.ThriftHttpServlet.doPost method does not handle all Exception kinds. This leaves uncaught Exception handling choices to the Jetty HTTP library. We fix that. Also, a Thread.UncaughtExceptionHandler is added to Hive Server 2 so uncaught Exception are handled uniformly, including making them logged and not just printed to stderr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25385) Prevent Hive Server 2 process failures when InterruptedException encountered
Matt McCline created HIVE-25385: --- Summary: Prevent Hive Server 2 process failures when InterruptedException encountered Key: HIVE-25385 URL: https://issues.apache.org/jira/browse/HIVE-25385 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline To prevent Hive Server 2 process failure, wrap InterruptedException with another Exception like MetaException, HiveSQLException, etc. Otherwise, InterruptedException rises to Thread.run and kills the process. Example of problem stack trace: java.lang.reflect.UndeclaredThrowableExceptionjava.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy44.heartbeat(Unknown Source) at sun.reflect.GeneratedMethodAccessor127.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2990) at com.sun.proxy.$Proxy44.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:622) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:999) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:259) ... 19 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25307) Hive Server 2 crashes when Thrift library encounters particular security protocol issue
Matt McCline created HIVE-25307: --- Summary: Hive Server 2 crashes when Thrift library encounters particular security protocol issue Key: HIVE-25307 URL: https://issues.apache.org/jira/browse/HIVE-25307 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline A RuntimeException is thrown by the Thrift library that causes Hive Server 2 to crash on our customer's machine. If you Google this the exception has been reported a couple of times over the years but not fixed. A blog (see references below) says it is an occasional security protocol issue between Hive Server 2 and a proxy like a Gateway. One challenge is the Thrift TTransportFactory getTransport method declaration throws no Exceptions hence the likely choice of RuntimeException. But that Exception is fatal to Hive Server 2. The proposed fix is a work around that catches RuntimeException in Hive Server 2, saves the Exception cause in a dummy TTransport object, and throws the cause when TTransport's open method is called later. ExceptionClassName: java.lang.RuntimeException ExceptionStackTrace: java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:694) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:691) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:326) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 10 more References: [Hive server 2 thrift error - Cloudera Community - 34293|https://community.cloudera.com/t5/Support-Questions/Hive-server-2-thrift-error/td-p/34293] Eric Lin blog "“NO DATA OR NO SASL DATA IN THE STREAM” ERROR IN HIVESERVER2 LOG" [HIVE-12754] AuthTypes.NONE cause exception after HS2 start - ASF JIRA (apache.org) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant
Matt McCline created HIVE-25237: --- Summary: Thrift CLI Service Protocol: Enhance HTTP variant Key: HIVE-25237 URL: https://issues.apache.org/jira/browse/HIVE-25237 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline I have been thinking about the (Thrift) CLI Service protocol between the client and server. Cloudera's Prashanth Jayachandran (private e-mail) told me that its original BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP transport. HTTP is used when we go through a Gateway. The design for HTTP is stateless and different in nature than the direct BINARY TCP/IP connection. Which means today when we see that a Hive Server 2 response to a HTTP query request can be lost and that is part of the design... It is the WARNING we have seen when the Gateway drops its HTTP connection to Hive Server 2. We had been thinking this was a bug but it is by design. I think the HTTP design needs a rethink. When I worked for Tandem computers a long time ago messages were fault-tolerant. They used a message sequence #. When you send a message to a Tandem server it is a process pair. The message gets routed to the current process called the primary. The primary computes the message work and tells the backup process to remember the results before replying in case there is a failure. You can see where this goes -- if there is a failure before the client gets the result it retries and the backup process can resiliently give back the result the primary sent it. This isn't unique to Tandem -- without a process-pair -- this is a general resilient protocol. In the HTTP design says message lost is possible both directions (request and response). I think we adopt a better scheme but not necessarily a process pair. The first principle of rethink is the +_client_+ needs to generate a new operation num (an integer) that replaces the server-side generated random GUID. And the client generates a new msg num within its new operation. So beeline might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 1. If the client gets an OS connection kind of error, it retries with those (57, 1) numbers. Hive Server 2 will remember the last response. When Hive Server 2 gets a message, there are 3 cases: 1) The sessionId GUID is not valid -- for now we reject the request because it is likely Hive Server 2 killed the session perhaps because it was restarted. 2) The operationNum or operationMsgNum is new. (Assert the msg num increases monotonically.) Perform the request and save the response. And respond. 3) The (operationNum, operationMsgNum) matches the last request. Resiliently respond with the saved result. I think this message handling is in alignment with the HTTP stateless and any messages in-between can be lost philosophy. And it will shield the client from suffering a whole category of message failures that unnecessarily kill queries. This also allows to not worry about which request is idempotent or not but instead requests are resilient. - Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for idempotent and unsent http methods by prasanthj · Pull Request #1983 · apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25228) Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster
Matt McCline created HIVE-25228: --- Summary: Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster Key: HIVE-25228 URL: https://issues.apache.org/jira/browse/HIVE-25228 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline CONSIDER: Have Hive Server 2 monitor operations (queries) for continuing client interest. If a client does not ask for status every 15 seconds, then automatically kill a query and release its txn locks and job resources. Users will experience queries cleaning up much faster (15 to 30 seconds instead of minutes and possibly many minutes) when client communication is lost. Cleaning up those queries prevents other queries from being blocked on EXCLUSIVE txn locks and blocking of scheduling of their queries including retries of the original query. Today, users can get timeouts when they retry a query that got a connection error causing understandably upset users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25227) Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout
Matt McCline created HIVE-25227: --- Summary: Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout Key: HIVE-25227 URL: https://issues.apache.org/jira/browse/HIVE-25227 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline CONSIDER: Avoid proxy (GW) timeouts on long Hive query compiles. Use request to start the operation; then poll for status like we do for execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25196) Native Vectorization of GenericUDFSplit function
Matt McCline created HIVE-25196: --- Summary: Native Vectorization of GenericUDFSplit function Key: HIVE-25196 URL: https://issues.apache.org/jira/browse/HIVE-25196 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline Provide faster 'split' function for vector-mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25191) Modernize Hive Thrift CLI Service Protocol
Matt McCline created HIVE-25191: --- Summary: Modernize Hive Thrift CLI Service Protocol Key: HIVE-25191 URL: https://issues.apache.org/jira/browse/HIVE-25191 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Unnecessary errors are occurring with the advent of proxy use such as Gateways between the Hive client and Hive Server 2. Query failures can be due to arbitrary proxy timeouts. This proposal avoids the timeouts by changing the protocol to do regular polling. Currently, the Hive client uses one request for the query compile request. Long query compile times make those requests vulnerable to the arbitrary proxy timeouts. Another issue is Hive Server 2 sometimes does not notice the client has failed or has lost interest in a potentially long running query. This causes Hive locks and Big Data query resources to be held unnecessarily. The assumption is the client issues a cancel query request when it gets an error. This assumption does not always hold. If the proxy returned an error itself, that proxy may reject the subsequent cancel request, too. And, if the client is killed or the network is down, the client cannot complete a cancel request. The proposed solution here is for Hive Server 2 to watch that the client is sending regular polling requests for status. If a client ceases those requests, then Hive Server 2 will cancel the query. Hive owns the JDBC path (i.e. HiveDriver). The ODBC path may be more challenging because vendors provide ODBC drivers and Hive does not own the ODBC protocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled
Matt McCline created HIVE-25140: --- Summary: Hive Distributed Tracing -- Part 1: Disabled Key: HIVE-25140 URL: https://issues.apache.org/jira/browse/HIVE-25140 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to Thrift and protobuf version conflicts. Has Spans for BeeLine and Hive. Server 2. The code was developed on branch-3.1 and porting Spans to the Hive MetaStore on master is taking more time due to major code refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25069) Hive Distributed Tracing
Matt McCline created HIVE-25069: --- Summary: Hive Distributed Tracing Key: HIVE-25069 URL: https://issues.apache.org/jira/browse/HIVE-25069 Project: Hive Issue Type: New Feature Reporter: Matt McCline Instrument Hive code to gather distributed traces and export trace data to a configurable collector. Distributed tracing is a revolutionary tool for debugging issues. We will use new OpenTelemetry open-source standard that our industry has aligned on. OpenTelemetry is the merger of two earlier distributed tracing projects OpenTracing and OpenCensus. Next step: Add design document that goes into distributed tracing in more detail and describes how Hive will enhanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
RE: [EXTERNAL] Re: Hive meetup on March 17
Yes, thank you Zoltan! I learned a lot, too. I see lots of potential in more meetings. Not all of my team could attend -- please publish the recording. -Original Message- From: Stamatis Zampetakis Sent: Thursday, March 18, 2021 1:16 AM To: dev Subject: [EXTERNAL] Re: Hive meetup on March 17 Thanks for organising this Zoltan, and many thanks to all the speakers for the nice presentations. I certainly learned some new stuff for the project, looking forward to the next one. Best, Stamatis On Wed, Mar 17, 2021 at 4:05 PM Zoltan Haindrich wrote: > Hey All! > > We have our first online Hive meetup today! > > We will start at 5pm UTC for other timezones see on this site: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > timeanddate.com%2Fworldclock%2Fmeetingdetails.html%3Fyear%3D2021%26mon > th%3D3%26day%3D17%26hour%3D17%26min%3D0%26sec%3D0%26p1%3D50%26p2%3D137 > %26p3%3D136%26p4%3D70%26p5%3D176data=04%7C01%7Cmatt.mccline%40mic > rosoft.com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141af91ab2d > 7cd011db47%7C1%7C0%7C637516521984090461%7CUnknown%7CTWFpbGZsb3d8eyJWIj > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000 > p;sdata=m5hEvDc9g%2FCrTyt6DY5JDPfcrrjCsieIFtLqVAqB%2Bbg%3Dreserve > d=0 > > If you don't yet have the meeting url - it will be held in a zoom room at: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fclou > dera.zoom.us%2Fj%2F91452267238data=04%7C01%7Cmatt.mccline%40micro > soft.com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141af91ab2d7c > d011db47%7C1%7C0%7C637516521984090461%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000 > sdata=m6nuxdXVXbdJKp187s8FesPug2Hsi51osP82GuMGGPI%3Dreserved=0 > Most likely there will be a recording of it - which will be shared > afterwards. > > I was thinking to use Github discussions to (also) ask questions > during the event - because it could help untangle "question time" from > "answer time"; we may of course choose not to use it - but I've > experimented with it and if we add discussions to the "Q" section we > may even answer it - and people thinking about the same thing may > extend the question by adding further comments...or just vote on the > question... > not sure how well it will work - might worth a try! > I've set it up on my own fork for now: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > ub.com%2Fkgyrtkirk%2Fhive%2Fdiscussionsdata=04%7C01%7Cmatt.mcclin > e%40microsoft.com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141a > f91ab2d7cd011db47%7C1%7C0%7C637516521984100458%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > 1000sdata=RRFY4ujn69%2FTiGtmbw936NsnRrsVIgwBhRK4%2FxmEFpo%3D > reserved=0 > > The meetup url is here: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > meetup.com%2FHive-User-Group-Meeting%2Fevents%2F276886707data=04% > 7C01%7Cmatt.mccline%40microsoft.com%7Cd80a11344dea49e60da408d8e9e62526 > %7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637516521984100458%7CUnk > nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw > iLCJXVCI6Mn0%3D%7C1000sdata=QYJ6dj4%2F9SxWhopph2BvWMh1ngoEnQ4DelF > CtVj7c6M%3Dreserved=0 > > Meet you there! > > cheers, > Zoltan > > On 3/16/21 3:29 PM, Zoltan Haindrich wrote: > > Hey All! > > > > Our meetup is also available as a meetup.com event: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww > > w.meetup.com%2FHive-User-Group-Meeting%2Fevents%2F276886707%2Fd > > ata=04%7C01%7Cmatt.mccline%40microsoft.com%7Cd80a11344dea49e60da408d > > 8e9e62526%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6375165219841 > > 00458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC > > JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=myaJ8nZKBNJ0QDcwukBnYi > > xxqvalB8trd3BMM%2Bvsn30%3Dreserved=0 > > > > In case you want to add it to the calendar or something... :) > > > > cheers, > > Zoltan > > > > > > On 3/11/21 3:00 PM, Zoltan Haindrich wrote: > >> Hey All! > >> > >> I would like to invite you to our (first?) online Hive meetup! It > >> will > be held on March 17. 17:00 UTC > >> I'll send out a zoom url before the event starts! > >> > >> The planned topics are accessible here: > >> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs > .google.com%2Fdocument%2Fd%2F12jaWa7e6jvVjUaxoMWNJcjvTjnNoqwdCAMyswY1O > iUg%2Fedit%3Fusp%3Dsharingdata=04%7C01%7Cmatt.mccline%40microsoft > .com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141af91ab2d7cd011 > db47%7C1%7C0%7C637516521984100458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdat > a=gpN63jFVNE%2BUAI2pf0DGjIg8ofdlT08NH1yta9giIWg%3Dreserved=0 > >> > >> Meet you there! > >> > >> cheers, > >> Zoltan > >> > >> > >> > >> >
RE: [EXTERNAL] Re: Any plan for new hive 3 or 4 release?
Yes to Hive 4 release. Plenty of changes (1,500+). Yes to regular release cadence (e.g. 3 month). -Original Message- From: Edward Capriolo Sent: Saturday, February 27, 2021 12:16 PM To: Michel Sumbul Cc: dev@hive.apache.org; u...@hive.apache.org Subject: [EXTERNAL] Re: Any plan for new hive 3 or 4 release? The challenge is the venders. They almost always want to tie a release to some offering of there's. Healthy software is released all the time. Just ship it. Call a vote and propose a release. I'll +1 it if the tests pass! On Friday, February 26, 2021, Michel Sumbul wrote: > It will be amazing if the community could produce a release every > quarter/6months. :-) > > Le ven. 26 févr. 2021 à 14:30, Edward Capriolo > a écrit : > >> Hive was releasable trunk for the longest time. Facebook days. Then >> the big data vendors got more involved. Then it became a pissing >> match about features. This vendor likes tez this vendor dont, this >> vendor likes hive on spark this one dont. >> >> Then this vendor wants to tell everyone hive stinks use impala. Then >> this vendor aquired that vendor.. >> >> The best thing for hive is to have one branch master and do quarterly >> releases. >> >> >> >> On Friday, February 26, 2021, Peter Vary >> wrote: >> >>> Hi Lee, >>> >>> When I started to work on Hive around 4 years ago, MR was already >>> set as deprecated. So you definitely should scan even older archives. >>> >>> For Iceberg integration, it would be good to have more frequent >>> releases for Hive as well. >>> >>> Thanks, Peter >>> >>> >>> >>> Lee Ming-Ta ezt írta (időpont: 2021. febr. >>> 24., Sze >>> 4:34): >>> >>> > Dear all, >>> > >>> > I probably didn't follow that much and would like to ask if anyone >>> > can point me to some resources about the reason to remove MR? >>> > Or what kine of keyword to search on Google? >>> > >>> > Thank you very much! Wish everyone a happy Lunar New Year. >>> > >>> > -- >>> > *寄件者:* Mass Dosage >>> > *寄件日期:* 2021年2月23日 下午 09:49 >>> > *收件者:* dev@hive.apache.org >>> > *副本:* Michel Sumbul ; u...@hive.apache.org >>> > < u...@hive.apache.org> >>> > *主旨:* Re: Any plan for new hive 3 or 4 release? >>> > >>> > I would love to see a HIve 3.1 release which is capable of being >>> > used >>> on >>> > Java 11 like Hive 2 is. >>> > >>> > What is the main difference going to be between Hive 3 and 4? The >>> removal >>> > of MR? >>> > >>> > On Mon, 22 Feb 2021 at 16:46, Zoltan Haindrich wrote: >>> > >>> > Hey Michel! >>> > >>> > Yes it was a long time ago we had a release; we have quite a few >>> > new features in master. >>> > I think we are scaring people for some time now that we will be >>> dropping >>> > MR support...I think we should do that. >>> > >>> > I would really like to see a new Hive release in the near future >>> > as >>> well - >>> > there is no way for users to even try out new features. >>> > I was planning to add nightly builds to package the latest >>> > master's >>> state >>> > into a deployable artifact - I think a service like may help >>> > pretest >>> our >>> > next release; I think it >>> > won't take much to do it so I'll probably throw it together in the >>> > next couple days! >>> > >>> > cheers, >>> > Zoltan >>> > >>> > On 2/21/21 2:27 PM, Michel Sumbul wrote: >>> > > Hi Guys, >>> > > >>> > > If I'm not wrong, the last release of Hive 3.x is 18 months old. >>> > > I wanted to ask if you had any roadmap / plan to release a new >>> version of >>> > > Hive 3.x or Hive 4? >>> > > >>> > > Thanks, >>> > > Michel >>> > > >>> > >>> > >>> >> >> >> -- >> Sorry this was sent from mobile. Will do less grammar and spell check >> than usual. >> > -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
RE: [EXTERNAL] Hive meetup
Definitely interested. -Original Message- From: Zoltan Haindrich Sent: Monday, February 22, 2021 10:17 AM To: dev@hive.apache.org Subject: [EXTERNAL] Hive meetup Hey All! It was quite some time ago when we had a meetup - and in these covid times it would be online-only anyway :) We were mentioning this lately here and there at Cloudera. I think we could have a few talks spanning 2-3 hours or so. Are there any interest in it? I would be happy to talk about how hive-test-kube works and how hive-dev-box is employed during testing. cheers, Zoltan
Re: [DISCUSS] Hive 3.2
A few comments. I am going to move forward with a VOTE on Hive 3.2 next. -Original Message- From: Matt McCline Sent: Monday, October 26, 2020 2:12 PM To: dev@hive.apache.org Subject: RE: [EXTERNAL] Re: [DISCUSS] Hive 3.2 Hi László, Thank you for your response. Since 3.1.3-rc0 was tagged on Jan 13 there are 3156 commits in master more than in this tag. I mostly wanted to address the huge number of changes in master. We could do a 3.1.3 release with a modest number of changes, and a 3.2 with perhaps many or all of the 3,000+ changes in master. What do you think? Matt -Original Message- From: László Bodor Sent: Monday, October 26, 2020 4:19 AM To: dev@hive.apache.org Subject: [EXTERNAL] Re: [DISCUSS] Hive 3.2 Sorry, posted incorrect link for 3.1.3-rc0, the correct is: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Freleases%2Ftag%2Frelease-3.1.3-rc0data=04%7C01%7Cmatt.mccline%40microsoft.com%7Ceaddbcb684604ac8867808d879a0ff93%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637393079672104819%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=CXYiz9oONt6%2FpsGhAXso30qgjr3JQljyTKnxdE3XSvI%3Dreserved=0 On Mon, 26 Oct 2020 at 12:17, László Bodor wrote: > Hey! > > I'm also interested in PMCs' opinion. I think it should be released > from branch-3, otherwise, it's a 4.0, right? (which is a heavier > discussion, and I don't know what Hive4 will be about.) On 3.x we have > an official 3.1.2 and an abandoned 3.1.3-rc0, which is not yet > released as far as I can see. I guess the next release is supposed to > be 3.1.3 as we haven't changed tez/hadoop/orc dependencies since that, > and I don't think branch-3 was actively maintained. > > Regards, > Laszlo Bodor > > On Thu, 22 Oct 2020 at 21:24, Matt McCline > wrote: > >> Hey, >> Hive master is about 2 years ahead of 3.1 - it seems like time to >> release those changes. >> So, let us have community discussion about creating a Hive 3.2 release. >> I volunteer to be the release manager. I have not done that before, >> so I will need help. >> I will start a VOTE thread soon, but I would like to hear some >> opinions first. >> >> Thank you, >> Matt >> >> (It is unclear if there are enough major features or dependencies on >> projects that necessitate a major version bump) >> >>
RE: [EXTERNAL] Re: [DISCUSS] Hive 3.2
Hi László, Thank you for your response. Since 3.1.3-rc0 was tagged on Jan 13 there are 3156 commits in master more than in this tag. I mostly wanted to address the huge number of changes in master. We could do a 3.1.3 release with a modest number of changes, and a 3.2 with perhaps many or all of the 3,000+ changes in master. What do you think? Matt -Original Message- From: László Bodor Sent: Monday, October 26, 2020 4:19 AM To: dev@hive.apache.org Subject: [EXTERNAL] Re: [DISCUSS] Hive 3.2 Sorry, posted incorrect link for 3.1.3-rc0, the correct is: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Freleases%2Ftag%2Frelease-3.1.3-rc0data=04%7C01%7Cmatt.mccline%40microsoft.com%7Ceaddbcb684604ac8867808d879a0ff93%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637393079672104819%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=CXYiz9oONt6%2FpsGhAXso30qgjr3JQljyTKnxdE3XSvI%3Dreserved=0 On Mon, 26 Oct 2020 at 12:17, László Bodor wrote: > Hey! > > I'm also interested in PMCs' opinion. I think it should be released > from branch-3, otherwise, it's a 4.0, right? (which is a heavier > discussion, and I don't know what Hive4 will be about.) On 3.x we have > an official 3.1.2 and an abandoned 3.1.3-rc0, which is not yet > released as far as I can see. I guess the next release is supposed to > be 3.1.3 as we haven't changed tez/hadoop/orc dependencies since that, > and I don't think branch-3 was actively maintained. > > Regards, > Laszlo Bodor > > On Thu, 22 Oct 2020 at 21:24, Matt McCline > wrote: > >> Hey, >> Hive master is about 2 years ahead of 3.1 - it seems like time to >> release those changes. >> So, let us have community discussion about creating a Hive 3.2 release. >> I volunteer to be the release manager. I have not done that before, >> so I will need help. >> I will start a VOTE thread soon, but I would like to hear some >> opinions first. >> >> Thank you, >> Matt >> >> (It is unclear if there are enough major features or dependencies on >> projects that necessitate a major version bump) >> >>
[DISCUSS] Hive 3.2
Hey, Hive master is about 2 years ahead of 3.1 - it seems like time to release those changes. So, let us have community discussion about creating a Hive 3.2 release. I volunteer to be the release manager. I have not done that before, so I will need help. I will start a VOTE thread soon, but I would like to hear some opinions first. Thank you, Matt (It is unclear if there are enough major features or dependencies on projects that necessitate a major version bump)
[jira] [Created] (HIVE-20705) Vectorization: Native Vector MapJoin doesn't support Complex Big Table values
Matt McCline created HIVE-20705: --- Summary: Vectorization: Native Vector MapJoin doesn't support Complex Big Table values Key: HIVE-20705 URL: https://issues.apache.org/jira/browse/HIVE-20705 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20645) Vectorization: Implicit casting causes scratch vector reuse Wrong Results
Matt McCline created HIVE-20645: --- Summary: Vectorization: Implicit casting causes scratch vector reuse Wrong Results Key: HIVE-20645 URL: https://issues.apache.org/jira/browse/HIVE-20645 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline The bug fix in HIVE-20563 exposes a Wrong Results bug in vectorized_cast.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20524) Schema Evolution checking is broken in 3.0 for CHAR/VARCHAR
Matt McCline created HIVE-20524: --- Summary: Schema Evolution checking is broken in 3.0 for CHAR/VARCHAR Key: HIVE-20524 URL: https://issues.apache.org/jira/browse/HIVE-20524 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline The new org.apache.hadoop.hive.metastore.ColumnType class under hive version 3 hive-standalone-metadata-server method checkColTypeChangeCompatible lost a version 2 series bug fix that drops CHAR/VARCHAR (and DECIMAL I think) type decorations when checking for Schema Evolution compatibility. Hive1 version 2 did undecoratedTypeName(oldType) and Hive2 version performed the logic in TypeInfoUtils.implicitConvertible on the PrimitiveCategory not the raw type string. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20513) Vectorization: Improve Fast Vector MapJoin Bytes Hash Tables
Matt McCline created HIVE-20513: --- Summary: Vectorization: Improve Fast Vector MapJoin Bytes Hash Tables Key: HIVE-20513 URL: https://issues.apache.org/jira/browse/HIVE-20513 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Based on HIVE-20491 discussions, improve Fast Vector MapJoin Bytes Hash Tables by only storing a one word slot entry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68648: HIVE-20510
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68648/#review208396 --- ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java Lines 865 (patched) <https://reviews.apache.org/r/68648/#comment292287> In order for EXPLAIN VECTORIZATION to see the proper information on BucketNumExpression you need to call ve.setInputTypeInfos(inputTypeInfo); ve.setOutputTypeInfo(outputTypeInfo); on the new VectorExpression. Probably in a separate method. - Matt McCline On Sept. 6, 2018, 6:47 a.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68648/ > --- > > (Updated Sept. 6, 2018, 6:47 a.m.) > > > Review request for hive, Gopal V and Matt McCline. > > > Bugs: HIVE-20510 > https://issues.apache.org/jira/browse/HIVE-20510 > > > Repository: hive-git > > > Description > --- > > Vectorization : Support loading bucketed tables using sorted dynamic > partition optimizer. > Added a new VectorExpression BucketNumberExpression to evaluate > _bucket_number. > Made the loops as tight as possible. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java > 57f7c0108e > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java > 5ab59c9c61 > ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q > 435cdaddd0 > > ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out > 22f0a31eb3 > > > Diff: https://reviews.apache.org/r/68648/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
[jira] [Created] (HIVE-20496) Vectorization: Vectorized PTF IllegalStateException
Matt McCline created HIVE-20496: --- Summary: Vectorization: Vectorized PTF IllegalStateException Key: HIVE-20496 URL: https://issues.apache.org/jira/browse/HIVE-20496 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Testing rebased HIVE-18909 revealed this stack trace: {code} java.lang.IllegalStateException: null at com.google.common.base.Preconditions.checkState(Preconditions.java:159) ~[guava-19.0.jar:?] at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFEvaluatorStreamingDoubleSum.evaluateGroupBatch(VectorPTFEvaluatorStreamingDoubleSum.java:51) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.evaluateStreamingGroupBatch(VectorPTFGroupBatches.java:165) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:380) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:480) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387) [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20370) Vectorization: Add Native Vector MapJoin hash table optimization for Left/Right Outer Joins when there are no Small Table values
Matt McCline created HIVE-20370: --- Summary: Vectorization: Add Native Vector MapJoin hash table optimization for Left/Right Outer Joins when there are no Small Table values Key: HIVE-20370 URL: https://issues.apache.org/jira/browse/HIVE-20370 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Similar to Native Vector MapJoin's InnerBigOnly optimization that uses an efficient Hash Multi-Set with a counter instead of a Hash Map with an empty value, do the same for Outer joins. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20367) Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM
Matt McCline created HIVE-20367: --- Summary: Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM Key: HIVE-20367 URL: https://issues.apache.org/jira/browse/HIVE-20367 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Add support for vectorizing PTF AVG, MAX, MIN, SUM when: {noformat} ROWS PRECEDING(MAX)~CURRENT {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20352) Vectorization: Support grouping function
Matt McCline created HIVE-20352: --- Summary: Vectorization: Support grouping function Key: HIVE-20352 URL: https://issues.apache.org/jira/browse/HIVE-20352 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Support native vectorization for grouping function (part of Grouping Sets) so we don't need to use VectorUDFAdaptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20339) Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized
Matt McCline created HIVE-20339: --- Summary: Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized Key: HIVE-20339 URL: https://issues.apache.org/jira/browse/HIVE-20339 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Unnecessary: "PTF operator: More than 1 argument expression of aggregation function rank" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20328) Reenable: TestMiniDruidCliDriver
Matt McCline created HIVE-20328: --- Summary: Reenable: TestMiniDruidCliDriver Key: HIVE-20328 URL: https://issues.apache.org/jira/browse/HIVE-20328 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: slim bouguerra Reenable tests disabled in HIVE-20322. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20325) FlakyTest: TestMiniDruidCliDriver
Matt McCline created HIVE-20325: --- Summary: FlakyTest: TestMiniDruidCliDriver Key: HIVE-20325 URL: https://issues.apache.org/jira/browse/HIVE-20325 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline TestMiniDruidCliDriver is failing intermittently a significant percentage of the time. druid_timestamptz druidmini_joins druidmini_masking druidmini_test1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20315) Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions
Matt McCline created HIVE-20315: --- Summary: Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions Key: HIVE-20315 URL: https://issues.apache.org/jira/browse/HIVE-20315 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Generate multi-byte Unicode characters in addition to regular single byte characters for random data. Don't CAST from STRING/VARCHAR/CHAR TO STRING since all are stored in vectorization without padding. Fix vectorized BETWEEN expression work to avoid unnecessary CAST of DECIMAL constants. Fix NULL / Wrong Results issues in VectorElt. Change performance Q files to generate non-user EXPLAIN with VECTORIZATION display so unnecesary CAST / DECIMAL_64 conversions are visible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20294) Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT
Matt McCline created HIVE-20294: --- Summary: Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT Key: HIVE-20294 URL: https://issues.apache.org/jira/browse/HIVE-20294 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized COALESCE and ELT. Also, add tests for ARRAY and MAP indexing, IS [NOT] NULL and NOT -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20245) Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN
Matt McCline created HIVE-20245: --- Summary: Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN Key: HIVE-20245 URL: https://issues.apache.org/jira/browse/HIVE-20245 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized BETWEEN and IN. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20207) Vectorization: Fix NULL / Wrong Results issues in Filter / Compare
Matt McCline created HIVE-20207: --- Summary: Vectorization: Fix NULL / Wrong Results issues in Filter / Compare Key: HIVE-20207 URL: https://issues.apache.org/jira/browse/HIVE-20207 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized filter and compare. BUGS: 1) LongColLessLongColumn SIMD optimization do not work for very large integers: -7272907770454997143 < 8976171455044006767 outputVector[i] = (vector1[i] - vector2[i]) >>> 63; Produces 0 instead of 1... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20197) Vectorization: Add DECIMAL_64 testing, add Date/Interval/Timestamp arithmetic, and fix more NULL / Wrong Results issues in GROUP BY Aggregation Functions
Matt McCline created HIVE-20197: --- Summary: Vectorization: Add DECIMAL_64 testing, add Date/Interval/Timestamp arithmetic, and fix more NULL / Wrong Results issues in GROUP BY Aggregation Functions Key: HIVE-20197 URL: https://issues.apache.org/jira/browse/HIVE-20197 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Add DECIMAL_64 testing to TestVectorArithmetic and TestVectorAggregation. And, add a few more aggregation tests to TestVectorAggregation. Add + and - Date/Interval/Timestamp arithmetic tests to TestVectorArithmetic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20174) Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions
Matt McCline created HIVE-20174: --- Summary: Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions Key: HIVE-20174 URL: https://issues.apache.org/jira/browse/HIVE-20174 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized aggregation functions: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20091) Tez: Add security credentials for FileSinkOperator output
Matt McCline created HIVE-20091: --- Summary: Tez: Add security credentials for FileSinkOperator output Key: HIVE-20091 URL: https://issues.apache.org/jira/browse/HIVE-20091 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline DagUtils needs to add security credentials for the output for the FileSinkOperator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19992) Vectorization: Follow-on to HIVE-19951 --> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not
Matt McCline created HIVE-19992: --- Summary: Vectorization: Follow-on to HIVE-19951 --> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not implicit Key: HIVE-19992 URL: https://issues.apache.org/jira/browse/HIVE-19992 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline When ORC-380 that adds the SchemaEvolution.isOnlyImplicitConversion call is available in the ORC release used by Apache master (and branch-3), then update LlapRecordReader (see comments in HIVE-19951 change). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19951) Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution)
Matt McCline created HIVE-19951: --- Summary: Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution) Key: HIVE-19951 URL: https://issues.apache.org/jira/browse/HIVE-19951 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Currently, reading encoded ORC data does not support data type conversion. So, encoded reading and cache populating needs to be disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19929) Vectorization: Recheck for vectorization wrong results/execution failures
Matt McCline created HIVE-19929: --- Summary: Vectorization: Recheck for vectorization wrong results/execution failures Key: HIVE-19929 URL: https://issues.apache.org/jira/browse/HIVE-19929 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Use test variables hive.test.vectorized.execution.enabled.override=enable and hive.test.vectorization.suppress.explain.execution.mode=true to look for wrong results/execution failures when vectorization is forced ON and "Execution mode: vectorized" is suppressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67329: HIVE-19629: Enable Decimal64 reader after orc version upgrade
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67329/#review203919 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java Lines 4402 (patched) <https://reviews.apache.org/r/67329/#comment286297> NOTE TO SELF: Look at this again. - Matt McCline On May 25, 2018, 8:25 p.m., Prasanth_J wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67329/ > --- > > (Updated May 25, 2018, 8:25 p.m.) > > > Review request for hive and Matt McCline. > > > Bugs: HIVE-19629 > https://issues.apache.org/jira/browse/HIVE-19629 > > > Repository: hive-git > > > Description > --- > > HIVE-19629: Enable Decimal64 reader after orc version upgrade > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 931533a > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > 0af91bd > itests/src/test/resources/testconfiguration.properties d146f92 > > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapInputFormat.java > 6d29163 > > llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/GenericColumnVectorProducer.java > 7af1b05 > > llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java > feccb87 > > llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java > 4033b37 > > llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java > 1cfe929 > > llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/VectorDeserializeOrcWriter.java > de19b1d > > llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ConsumerFileMetadata.java > bf139c0 > > llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java > 0012afb > pom.xml e48974b > ql/pom.xml 06124f7 > ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 2246901 > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedInputFormatInterface.java > e74b185 > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java > 6588385 > ql/src/java/org/apache/hadoop/hive/ql/io/NullRowsInputFormat.java e632d43 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java f461364 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > 8c7c72e > ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 7485e60 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java 1a6db1f > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 5b001a0 > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java > d2e1a68 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcInputFormat.java > c581bba > ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 71682af > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java > 646b214 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java > ed6d577 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java > 394f826 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java af43b14 > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java > fb2335a > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ef678a8 > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java > d8a7af8 > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java 1533ffa > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java > 0c9c95d > > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java > e478371 > ql/src/test/queries/clientpositive/llap_acid2.q a409c26 > ql/src/test/queries/clientpositive/llap_decimal64_reader.q PRE-CREATION > ql/src/test/queries/clientpositive/llap_uncompressed.q 875356c > ql/src/test/results/clientpositive/acid_mapjoin.q.out 76a781e > ql/src/test/results/clientpositive/acid_nullscan.q.out 6dad497 > ql/src/test/results/clientpositive/acid_table_stats.q.out 2596922 > ql/src/test/results/clientpositive/annotate_stats_part.q.out 9e45101 > ql/src/test/results/clientpositive/annotate_stats_table.q.out b502957 > ql/src/test/results/clientpositive/autoColumnStats_
[jira] [Created] (HIVE-19566) Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions
Matt McCline created HIVE-19566: --- Summary: Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions Key: HIVE-19566 URL: https://issues.apache.org/jira/browse/HIVE-19566 Project: Hive Issue Type: Bug Reporter: Matt McCline Fix For: 3.1.0 Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized Complex Type functions: * index * (StructField) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19565) Vectorization: Fix NULL / Wrong Results issues in STRING Functions
Matt McCline created HIVE-19565: --- Summary: Vectorization: Fix NULL / Wrong Results issues in STRING Functions Key: HIVE-19565 URL: https://issues.apache.org/jira/browse/HIVE-19565 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized STRING functions: * char_length * concat * initcap * length * lower * ltrim * octet_length * regexp * rtrim * trim * upper * UDF: ** hex ** like ** substr -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19564) Vectorization: Fix NULL / Wrong Results issues in Functions
Matt McCline created HIVE-19564: --- Summary: Vectorization: Fix NULL / Wrong Results issues in Functions Key: HIVE-19564 URL: https://issues.apache.org/jira/browse/HIVE-19564 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Write new UT tests that use random data and intentional isRepeating batches to checks for NULL and Wrong Results for vectorized functions: * Generic UDF Functions ** abs ** bround ** ceiling ** floor ** pmod ** power ** round * UDF Functions ** Acos ** Asin ** Atan ** Bin ** Cos ** Degrees ** Exp ** Ln ** Log ** log10 ** log2 ** radians ** rand ** sign ** sin ** sqrt ** tan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19530) Vectorization: Fix JDBCSerde and re-enable vectorization
Matt McCline created HIVE-19530: --- Summary: Vectorization: Fix JDBCSerde and re-enable vectorization Key: HIVE-19530 URL: https://issues.apache.org/jira/browse/HIVE-19530 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline According to [~jcamachorodriguez] there is a big switch statement in the code that has might have missing types. This can lead to the string types seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19529) Vectorization: Date/Timestamp NULL issues
Matt McCline created HIVE-19529: --- Summary: Vectorization: Date/Timestamp NULL issues Key: HIVE-19529 URL: https://issues.apache.org/jira/browse/HIVE-19529 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline date_add/date_sub more TBD -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19498) Vectorization: CAST expressions produce wrong results
Matt McCline created HIVE-19498: --- Summary: Vectorization: CAST expressions produce wrong results Key: HIVE-19498 URL: https://issues.apache.org/jira/browse/HIVE-19498 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.1.0 DATE --> BOOLEAN DOUBLE --> DECIMAL STRING|CHAR|VARCHAR --> DECIMAL TIMESTAMP --> LONG -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19448) Vectorization: sysdb test doesn't work after enabling vectorization by default
Matt McCline created HIVE-19448: --- Summary: Vectorization: sysdb test doesn't work after enabling vectorization by default Key: HIVE-19448 URL: https://issues.apache.org/jira/browse/HIVE-19448 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline {noformat} Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Boolean at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaBooleanObjectInspector.getPrimitiveWritableObject(JavaBooleanObjectInspector.java:36) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:434) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:347) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:948){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19384) Vectorization: IfExprTimestampColumnScalarBase doesn't handle the arg1ColVector.noNulls case correctly
Matt McCline created HIVE-19384: --- Summary: Vectorization: IfExprTimestampColumnScalarBase doesn't handle the arg1ColVector.noNulls case correctly Key: HIVE-19384 URL: https://issues.apache.org/jira/browse/HIVE-19384 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline It is missing boilerplate code from HIVE-18622: "Vectorization: IF Statements, Comparisons, and more do not handle NULLs correctly" fix. {noformat} // Carefully handle NULLs... outputColVector.noNulls = false;{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19353) Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST
Matt McCline created HIVE-19353: --- Summary: Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST Key: HIVE-19353 URL: https://issues.apache.org/jira/browse/HIVE-19353 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Found by enabling vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData {noformat} Caused by: java.lang.RuntimeException: Unexpected column vector type LIST at org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:237) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:984) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:722) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19352) Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
Matt McCline created HIVE-19352: --- Summary: Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData Key: HIVE-19352 URL: https://issues.apache.org/jira/browse/HIVE-19352 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Turning vectorization on triggers a bug - see Jira . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19351) Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1
Matt McCline created HIVE-19351: --- Summary: Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1 Key: HIVE-19351 URL: https://issues.apache.org/jira/browse/HIVE-19351 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Why were the operator numbers unstable for: TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] when vectorization was enabled? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19350) Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1
Matt McCline created HIVE-19350: --- Summary: Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1 Key: HIVE-19350 URL: https://issues.apache.org/jira/browse/HIVE-19350 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Seem like the operator number instability issue to me that Pengcheng Xiong that could occur with vectorization. For now, turning off vectorization for: TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] Follow up Jira is -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19275) Vectorization: Wrong Results / Execution Failures when Vectorization turned on in Spark
Matt McCline created HIVE-19275: --- Summary: Vectorization: Wrong Results / Execution Failures when Vectorization turned on in Spark Key: HIVE-19275 URL: https://issues.apache.org/jira/browse/HIVE-19275 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0, 3.1.0 Quite a number of the bucket* tests had Wrong Results or Execution Failures. And others like semijoin, skewjoin, avro_decimal_native, mapjoin_addjar, mapjoin_decimal, nullgroup, decimal_join, mapjoin1. Some of the problems might be as simple as "-- SORT_QUERY_RESULTS" is missing. The bucket* problems looked more serious. This change sets "hive.vectorized.execution.enabled" to false at the top of those Q files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19269) Vectorization: Turn On by Default
Matt McCline created HIVE-19269: --- Summary: Vectorization: Turn On by Default Key: HIVE-19269 URL: https://issues.apache.org/jira/browse/HIVE-19269 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0, 3.1.0 Reflect that our most expected Hive deployment will be using vectorization and change the default of hive.vectorized.execution.enabled to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19264) Vectorization: Reenable vectorization in vector_adaptor_usage_mode.q
Matt McCline created HIVE-19264: --- Summary: Vectorization: Reenable vectorization in vector_adaptor_usage_mode.q Key: HIVE-19264 URL: https://issues.apache.org/jira/browse/HIVE-19264 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0, 3.1.0 [~vihangk1] observed vectorization had accidentally been turned off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66567/#review201121 --- ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java Lines 338 (patched) <https://reviews.apache.org/r/66567/#comment282106> Logging per row too expensive to leave in. ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java Line 338 (original), 344 (patched) <https://reviews.apache.org/r/66567/#comment282107> Unnecessary line. ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java Lines 453 (patched) <https://reviews.apache.org/r/66567/#comment282108> Please add comments as to the significanse of checking the acidOp flag. - Matt McCline On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66567/ > --- > > (Updated April 12, 2018, 6:24 p.m.) > > > Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline. > > > Bugs: HIVE-18910 > https://issues.apache.org/jira/browse/HIVE-18910 > > > Repository: hive-git > > > Description > --- > > Hive uses JAVA hash which is not as good as murmur for better distribution > and efficiency in bucketing a table. > Migrate to murmur hash but still keep backward compatibility for existing > users so that they dont have to reload the existing tables. > > To keep backward compatibility, bucket_version is added as a table property, > resulting in high number of result updates. > > > Diffs > - > > hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 > hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out > 153613e6d0 > hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e > hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java > 924e233293 > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java > 5dd0b8ea5b > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java > 7c2cadefa7 > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java > ad14c7265f > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > 3733e3d02f > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java > 03c28a33c8 > > hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java > 996329195c > > hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java > f9ee9d9a03 > > itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out > caa00292b8 > > itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out > ab8ad77074 > > itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out > 2b28a6677e > > itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out > cdb67dd786 > > itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out > 2c23a7e94f > > itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out > a1be085ea5 > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > 353b890b7c > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 5966740f88 > itests/src/test/resources/testconfiguration.properties 48d62a8bf9 > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e > ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java > d4363fdf91 > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java > 86f466fc4e > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java > 4077552a56 > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java > 1bc3fdabac > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f > ql/s
[jira] [Created] (HIVE-19200) Vectorization: Disable vectorization for LLAP I/O when a non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type conversion is needed
Matt McCline created HIVE-19200: --- Summary: Vectorization: Disable vectorization for LLAP I/O when a non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type conversion is needed Key: HIVE-19200 URL: https://issues.apache.org/jira/browse/HIVE-19200 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Disable vectorization for issue in HIVE-18763 until we can do the harder VRB conversion code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19167) Map data type doesn't keep the order of the key/values pairs as read (Part 2, The Sequel or SQL)
Matt McCline created HIVE-19167: --- Summary: Map data type doesn't keep the order of the key/values pairs as read (Part 2, The Sequel or SQL) Key: HIVE-19167 URL: https://issues.apache.org/jira/browse/HIVE-19167 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.1.0 HIVE-19116: "Vectorization: Vector Map data type doesn't keep the order of the key/values pairs as read" didn't fix all the places where HashMap is used instead of LinkedHashMap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19118) Vectorization: Turning on vectorization in escape_crlf produces wrong results
Matt McCline created HIVE-19118: --- Summary: Vectorization: Turning on vectorization in escape_crlf produces wrong results Key: HIVE-19118 URL: https://issues.apache.org/jira/browse/HIVE-19118 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19116) Vectorization: Vector Map data type doesn't keep the order of the key/values pairs as read
Matt McCline created HIVE-19116: --- Summary: Vectorization: Vector Map data type doesn't keep the order of the key/values pairs as read Key: HIVE-19116 URL: https://issues.apache.org/jira/browse/HIVE-19116 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline The VectorExtractRow class does not preserve the order of the key/value pairs when going from MapColumnVector to a Map object. This causes Q file differences in tests with the MAP data type making it seem like we are getting Wrong Results (well, actually we are). When LazyMap class (for example) adds key/value pairs to its "map" it uses a LinkedHashSet to preserve insert order. FYI: [~teddy.choi] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19110) Vectorization: Enabling vectorization causes TestContribCliDriver udf_example_arraymapstruct.q to produce Wrong Results
Matt McCline created HIVE-19110: --- Summary: Vectorization: Enabling vectorization causes TestContribCliDriver udf_example_arraymapstruct.q to produce Wrong Results Key: HIVE-19110 URL: https://issues.apache.org/jira/browse/HIVE-19110 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19109) Vectorization: Enabling vectorization causes delete_orig_table to produce Wrong Results
Matt McCline created HIVE-19109: --- Summary: Vectorization: Enabling vectorization causes delete_orig_table to produce Wrong Results Key: HIVE-19109 URL: https://issues.apache.org/jira/browse/HIVE-19109 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19108) Vectorization and Parquet: Turning on vectorization in parquet_ppd_decimal.q causes Wrong Query Results
Matt McCline created HIVE-19108: --- Summary: Vectorization and Parquet: Turning on vectorization in parquet_ppd_decimal.q causes Wrong Query Results Key: HIVE-19108 URL: https://issues.apache.org/jira/browse/HIVE-19108 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Found in vectorization enable by default experiment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19102) Vectorization: Suppress known Q file bugs
Matt McCline created HIVE-19102: --- Summary: Vectorization: Suppress known Q file bugs Key: HIVE-19102 URL: https://issues.apache.org/jira/browse/HIVE-19102 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline There are known bugs recently found and reported that occur when vectorization is turn on in Q files. Until those bugs are fixed, add SET statements to the top of the Q files that suppress vectorization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19088) Vectorization: Turning on vectorization in input_lazyserde.q causes ClassCastException
Matt McCline created HIVE-19088: --- Summary: Vectorization: Turning on vectorization in input_lazyserde.q causes ClassCastException Key: HIVE-19088 URL: https://issues.apache.org/jira/browse/HIVE-19088 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline {noformat} 2018-03-31T21:19:48,252 ERROR [LocalJobRunner Map Task Executor #0] mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:967) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.StandardUnionObjectInspector$StandardUnion at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:608) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRow(VectorAssignRow.java:998) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:956) ... 11 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19074) Vectorization: Add llap vectorization_div0.q.out Q output file
Matt McCline created HIVE-19074: --- Summary: Vectorization: Add llap vectorization_div0.q.out Q output file Key: HIVE-19074 URL: https://issues.apache.org/jira/browse/HIVE-19074 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline At some point llap/vectorization_div0.q.out got omitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19052) Vectorization: Disable Vector Pass-Thru MapJoin in the presence of old-style MR FilterMaps
Matt McCline created HIVE-19052: --- Summary: Vectorization: Disable Vector Pass-Thru MapJoin in the presence of old-style MR FilterMaps Key: HIVE-19052 URL: https://issues.apache.org/jira/browse/HIVE-19052 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Pass-Thru VectorMapJoinOperator and VectorSMBMapJoinOperator were not designed to handle old-style MR FilterMaps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19045) Vectorization: Disable vectorization in non-vectorized Parquet Q files
Matt McCline created HIVE-19045: --- Summary: Vectorization: Disable vectorization in non-vectorized Parquet Q files Key: HIVE-19045 URL: https://issues.apache.org/jira/browse/HIVE-19045 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline In preparation for turning vectorization on by default, explicitly turn off vectorization at the top of the Parquet Q files since there are a separate set of Parquet Vectorization Q files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19043) Vectorization: LazySimpleDeserializeRead fewer fields handling broken for Complex Types
Matt McCline created HIVE-19043: --- Summary: Vectorization: LazySimpleDeserializeRead fewer fields handling broken for Complex Types Key: HIVE-19043 URL: https://issues.apache.org/jira/browse/HIVE-19043 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Issues were revealed by vectorizing create_struct_table.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19037) Vectorization: Miscellaneous cleanup
Matt McCline created HIVE-19037: --- Summary: Vectorization: Miscellaneous cleanup Key: HIVE-19037 URL: https://issues.apache.org/jira/browse/HIVE-19037 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline # Extraneous INFO logging in VectorReduceSinkCommonOperator # NPE in EXPLAIN for some SelectColumnIsTrue vector expressions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19035) Vectorization: Disable exotic field reference form
Matt McCline created HIVE-19035: --- Summary: Vectorization: Disable exotic field reference form Key: HIVE-19035 URL: https://issues.apache.org/jira/browse/HIVE-19035 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline We currently don't support exotic field references like get a struct field from array<struct> returns a type array. Attempt causes ClassCastException in VectorizationContext that kills query planning. The Q file is input_testxpath3.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19032) Vectorization: Disable GROUP BY aggregations with DISTINCT
Matt McCline created HIVE-19032: --- Summary: Vectorization: Disable GROUP BY aggregations with DISTINCT Key: HIVE-19032 URL: https://issues.apache.org/jira/browse/HIVE-19032 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Vectorized GROUP BY does not support DISTINCT aggregation functions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19024) Vectorization: Disable complex type constants
Matt McCline created HIVE-19024: --- Summary: Vectorization: Disable complex type constants Key: HIVE-19024 URL: https://issues.apache.org/jira/browse/HIVE-19024 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Currently, complex type constants are not detected and cause execution failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19020) Vectorization: When vectorized, orc_null_check.q throws NPE in VectorExpressionWriterFactory
Matt McCline created HIVE-19020: --- Summary: Vectorization: When vectorized, orc_null_check.q throws NPE in VectorExpressionWriterFactory Key: HIVE-19020 URL: https://issues.apache.org/jira/browse/HIVE-19020 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to orc_null_check.q triggers this call stack: {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$18.setValue(VectorExpressionWriterFactory.java:1465) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$18.writeValue(VectorExpressionWriterFactory.java:1453) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:199) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:151) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:813) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:846) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?] at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19019) Vectorization and Parquet: When vectorized, parquet_schema_evolution.q throws HiveException "Not implemented yet"
Matt McCline created HIVE-19019: --- Summary: Vectorization and Parquet: When vectorized, parquet_schema_evolution.q throws HiveException "Not implemented yet" Key: HIVE-19019 URL: https://issues.apache.org/jira/browse/HIVE-19019 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to parquet_schema_evolution.q triggers this call stack: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Not implemented yet at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$19.writeValue(VectorExpressionWriterFactory.java:1496) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:199) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:151) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.flushDeserializerBatch(VectorMapOperator.java:630) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setupPartitionContextVars(VectorMapOperator.java:698) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:607) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1210) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:829) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?] {noformat} The complex types in VectorExpressionWriterFactory are not fully implemented. FYI: [~vihangk1] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19016) Vectorization and Parquet: When vectorized, parquet_nested_complex.q produces RuntimeException: Unsupported type used
Matt McCline created HIVE-19016: --- Summary: Vectorization and Parquet: When vectorized, parquet_nested_complex.q produces RuntimeException: Unsupported type used Key: HIVE-19016 URL: https://issues.apache.org/jira/browse/HIVE-19016 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to parquet_nested_complex.q triggers this call stack: {noformat} Caused by: java.lang.RuntimeException: Unsupported type used in list:array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array>>>>>>>>>>>>>>>>>>>>> at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkListColumnSupport(VectorizedParquetRecordReader.java:589) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:525) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {noformat} FYI: [~vihangk1] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
Matt McCline created HIVE-19015: --- Summary: Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException Key: HIVE-19015 URL: https://issues.apache.org/jira/browse/HIVE-19015 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Adding "SET hive.vectorized.execution.enabled=true;" to parquet_map_of_arrays_of_ints.q triggers this call stack: {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {noformat} FYI: [~vihangk1] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18995) Vectorization: Add option to suppress "Execution mode: vectorized" for testing purposes
Matt McCline created HIVE-18995: --- Summary: Vectorization: Add option to suppress "Execution mode: vectorized" for testing purposes Key: HIVE-18995 URL: https://issues.apache.org/jira/browse/HIVE-18995 Project: Hive Issue Type: Improvement Components: Hive Reporter: Matt McCline Assignee: Matt McCline In order to see Q file differences in large runs it is helpful to eliminate change noise from "Execution mode: vectorized" in EXPLAIN output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18908) Add support for FULL OUTER JOIN to MapJoin
Matt McCline created HIVE-18908: --- Summary: Add support for FULL OUTER JOIN to MapJoin Key: HIVE-18908 URL: https://issues.apache.org/jira/browse/HIVE-18908 Project: Hive Issue Type: Improvement Components: Hive Reporter: Matt McCline Assignee: Matt McCline Currently, we do not support FULL OUTER JOIN in MapJoin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18819) Vectorization: Optimize IF statement expression evaluation of THEN/ELSE
Matt McCline created HIVE-18819: --- Summary: Vectorization: Optimize IF statement expression evaluation of THEN/ELSE Key: HIVE-18819 URL: https://issues.apache.org/jira/browse/HIVE-18819 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Currently, all the rows of a batch are evaluated for the THEN and ELSE expressions even though only a value from one of them is needed for any particular row. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18807) Fix broken test caused by HIVE-18493
Matt McCline created HIVE-18807: --- Summary: Fix broken test caused by HIVE-18493 Key: HIVE-18807 URL: https://issues.apache.org/jira/browse/HIVE-18807 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18806) Add @Ignore for broken test caused by HIVE-18493
Matt McCline created HIVE-18806: --- Summary: Add @Ignore for broken test caused by HIVE-18493 Key: HIVE-18806 URL: https://issues.apache.org/jira/browse/HIVE-18806 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18800) Vectorization: VectorCoalesce doesn't handle the all repeated NULLs case
Matt McCline created HIVE-18800: --- Summary: Vectorization: VectorCoalesce doesn't handle the all repeated NULLs case Key: HIVE-18800 URL: https://issues.apache.org/jira/browse/HIVE-18800 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix for HIVE-18622 broken the case when all columns are repeated NULLs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18758) Vectorization: Fix VectorUDAFVarFinal produces Wrong Results
Matt McCline created HIVE-18758: --- Summary: Vectorization: Fix VectorUDAFVarFinal produces Wrong Results Key: HIVE-18758 URL: https://issues.apache.org/jira/browse/HIVE-18758 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix and turn back on vectorization for issue found in https://issues.apache.org/jira/browse/HIVE-18756 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18756) Vectorization: VectorUDAFVarFinal produces Wrong Results
Matt McCline created HIVE-18756: --- Summary: Vectorization: VectorUDAFVarFinal produces Wrong Results Key: HIVE-18756 URL: https://issues.apache.org/jira/browse/HIVE-18756 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline For a large query. Disabling vectorization for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18744) Vectorization: VectorHashKeyWrapperBatch doesn't check repeated NULLs correctly
Matt McCline created HIVE-18744: --- Summary: Vectorization: VectorHashKeyWrapperBatch doesn't check repeated NULLs correctly Key: HIVE-18744 URL: https://issues.apache.org/jira/browse/HIVE-18744 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Logic for checking selectedInUse isRepeating case for NULL is broken. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18722) Vectorization: Adding SUM(HASH(..)) to full query seems to produce flakey results -- need to investiage
Matt McCline created HIVE-18722: --- Summary: Vectorization: Adding SUM(HASH(..)) to full query seems to produce flakey results -- need to investiage Key: HIVE-18722 URL: https://issues.apache.org/jira/browse/HIVE-18722 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline When added to HIVE-18622 changes, the query results vary from laptop results when run on Hive QA cluster. Need to investigate after HIVE-18622 commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18622) Vectorization: IF statement, Comparisons, and more do not handle NULLs correctly
Matt McCline created HIVE-18622: --- Summary: Vectorization: IF statement, Comparisons, and more do not handle NULLs correctly Key: HIVE-18622 URL: https://issues.apache.org/jira/browse/HIVE-18622 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 Many vector expression classes are missing guards around setting noNulls among other things. {code:java} // Carefully update noNulls... if (outputColVector.noNulls) { outputColVector.noNulls = inputColVector.noNulls; } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18600) Vectorization: Top-Level Vector Expression Scratch Column Deallocation
Matt McCline created HIVE-18600: --- Summary: Vectorization: Top-Level Vector Expression Scratch Column Deallocation Key: HIVE-18600 URL: https://issues.apache.org/jira/browse/HIVE-18600 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 The operators create various vector expression *arrays* for predicates, SELECT clauses, key expressions, etc. We could have those be marked as special "top level" vector expression then we could defer deallocation until the top level expression is complete. This could be a simple solution that avoids trying fix our current eager deallocation that tries to reuse scratch columns as soon as possible. It *isn't optimal*, but it *shouldn't be too bad*. This solution is much better than not deallocating at all - especially for queries that SELECT a large number of columns or have a lot of expressions in the operator tree. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18562) Vectorization: CHAR/VARCHAR conversion in VectorDeserializeRow is broken
Matt McCline created HIVE-18562: --- Summary: Vectorization: CHAR/VARCHAR conversion in VectorDeserializeRow is broken Key: HIVE-18562 URL: https://issues.apache.org/jira/browse/HIVE-18562 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 Altering a CHAR/VARCHAR column's maxLength to a shorter value does not truncate values when vectorized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18561) Vectorization: Current vector PTF doesn't work under GroupBy and is designed for reduce-shuffle input
Matt McCline created HIVE-18561: --- Summary: Vectorization: Current vector PTF doesn't work under GroupBy and is designed for reduce-shuffle input Key: HIVE-18561 URL: https://issues.apache.org/jira/browse/HIVE-18561 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Need to add validation check in Vectorizer that doesn't vectorize unless PTF is under reduce-shuffle (with optional SELECT in-between). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18551) Vectorization: VectorMapOperator tries to write too many vector columns for Hybrid Grace
Matt McCline created HIVE-18551: --- Summary: Vectorization: VectorMapOperator tries to write too many vector columns for Hybrid Grace Key: HIVE-18551 URL: https://issues.apache.org/jira/browse/HIVE-18551 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Fix For: 3.0.0 Code incorrectly uses projectedColumns.length instead of singleRow.length -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18531) Vectorization: Vectorized PTF operator should not set the initial type infos
Matt McCline created HIVE-18531: --- Summary: Vectorization: Vectorized PTF operator should not set the initial type infos Key: HIVE-18531 URL: https://issues.apache.org/jira/browse/HIVE-18531 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline The Vectorized PTF operator is mistakenly setting the initial type infos for its output VectorizationContext. It should not. It is only creating a projection of the initial columns from ReduceSink (i.e. keys, values) plus scratch columns for output columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18524) Vectorization: Execution failure related to non-standard embedding of IfExprConditionalFilter inside VectorUDFAdaptor (HIVE-17139)
Matt McCline created HIVE-18524: --- Summary: Vectorization: Execution failure related to non-standard embedding of IfExprConditionalFilter inside VectorUDFAdaptor (HIVE-17139) Key: HIVE-18524 URL: https://issues.apache.org/jira/browse/HIVE-18524 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline {nocode} insert overwrite table insert_10_1 select cast(gpa as float), age, IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL), IF(LENGTH(name)>10,cast(name as binary),NULL) from studentnull10k vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double] ExprNodeDescs: UDFToFloat(gpa) (type: float), age (type: int), if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp), if((length(name) > 10), CAST( name AS BINARY), null) (type: binary) selectExpressions: VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null)) (children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean) -> 5:timestamp, VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null)) (children: LongColGreaterLongScalar(col 4:int, val 10)(children: StringLength(col 0:string) -> 4:int) -> 6:boolean, VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary {nocode} *// Notice there is no vector expression shown for the last IF stmt.* It has been magically embedded inside the VectorUDFAdaptor object... Execution results in this call stack. {nocode} Caused by: java.lang.NullPointerException at java.util.Arrays.copyOfRange(Arrays.java:3521) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211) at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145) ... 22 more {nocode} Change is due to: HIVE-17139: Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine. (Jia Ke, reviewed by Ferdinand Xu) Embedding a raw vector expression outside of VectorizationContext is quite non-standard and evidently buggy. [~Ferd] [~Ke Jia] I am inclined to revert this change. Comments? CC: [~ashutoshc] [~hagleitn] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18521) Vectorization: query failing in reducer VectorUDAFAvgDecimalPartial2 java.lang.ClassCastException StructTypeInfo --> DecimalTypeInfo
Matt McCline created HIVE-18521: --- Summary: Vectorization: query failing in reducer VectorUDAFAvgDecimalPartial2 java.lang.ClassCastException StructTypeInfo --> DecimalTypeInfo Key: HIVE-18521 URL: https://issues.apache.org/jira/browse/HIVE-18521 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18517) Vectorization: Fix VectorMapOperator to accept VRBs to support LLAP Caching
Matt McCline created HIVE-18517: --- Summary: Vectorization: Fix VectorMapOperator to accept VRBs to support LLAP Caching Key: HIVE-18517 URL: https://issues.apache.org/jira/browse/HIVE-18517 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline LLAP is able to deserialize and cache data from an input format (e.g. TextInputFormat) and will deliver that cached data to VectorMapOperator as VRBs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18493) Add display escape for CR/LF to Hive CLI and Beeline
Matt McCline created HIVE-18493: --- Summary: Add display escape for CR/LF to Hive CLI and Beeline Key: HIVE-18493 URL: https://issues.apache.org/jira/browse/HIVE-18493 Project: Hive Issue Type: Bug Components: Beeline, Hive Affects Versions: 3.0.0 Reporter: Matt McCline Assignee: Matt McCline Add optional display escaping of carriage return and line feed so row output remains one line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)