from:"Matt McCline"

[jira] [Created] (HIVE-25865) ALTER RENAME suppresses commitTransaction failure and reports operation success

2022-01-12 Thread Matt McCline (Jira)

Matt McCline created HIVE-25865:
---

 Summary: ALTER RENAME suppresses commitTransaction failure and 
reports operation success
 Key: HIVE-25865
 URL: https://issues.apache.org/jira/browse/HIVE-25865
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Matt McCline
Assignee: Matt McCline


If the Commit Tx fails, HiveAlterHandler,alterTable does not report an error. 
It suppresses the issue and returns successfully.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25493) TBLPROPERTIES upper- vs. lower-case confusion

2021-08-31 Thread Matt McCline (Jira)

Matt McCline created HIVE-25493:
---

 Summary: TBLPROPERTIES upper- vs. lower-case confusion
 Key: HIVE-25493
 URL: https://issues.apache.org/jira/browse/HIVE-25493
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Matt McCline
Assignee: Matt McCline


User confused by ALTER TABLE SET PROPERTIES difference between 
'EXTERNAL'='FALSE' (ignored adds 2 properties EXTERNAL and FALSE) and 
'external'='false' (transaction error).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25478) Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS

2021-08-24 Thread Matt McCline (Jira)

Matt McCline created HIVE-25478:
---

 Summary: Temp file left over after ANALYZE TABLE .. COMPUTE 
STATISTICS FOR COLUMNS
 Key: HIVE-25478
 URL: https://issues.apache.org/jira/browse/HIVE-25478
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Matt McCline
Assignee: Matt McCline


The dot staging file (".hive-staging") file is not removed at the end of the 
ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS operation as it is for say an 
INSERT that does automatic statistics collection. I expected it would be 
deleted after the Stats Work stage.

Any ideas where in the code to add automatic deletion (hook)?

hdfs dfs -ls /hive/warehouse/managed/table_orc
Found 2 items
drwxr-xr-x   - hive supergroup  0 2021-08-24 17:19 
/hive/warehouse/managed/table_orc/.hive-staging_hive_2021-08-24_17-19-17_228_4856027533912221506-7
drwxr-xr-x   - hive supergroup  0 2021-08-24 07:17 
/hive/warehouse/managed/table_orc/delta_001_001_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two

2021-08-11 Thread Matt McCline (Jira)

Matt McCline created HIVE-25446:
---

 Summary: VectorMapJoinFastHashTable.validateCapacity 
AssertionError: Capacity must be a power of two
 Key: HIVE-25446
 URL: https://issues.apache.org/jira/browse/HIVE-25446
 Project: Hive
  Issue Type: Bug
 Environment: Encountered this in a very large query:

Caused by: java.lang.AssertionError: Capacity must be a power of two

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215)

   at 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)

   at 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)

   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25396) Improve uncaught Thread Exception handling in Hive Server 2

2021-07-27 Thread Matt McCline (Jira)

Matt McCline created HIVE-25396:
---

 Summary: Improve uncaught Thread Exception handling in Hive Server 
2
 Key: HIVE-25396
 URL: https://issues.apache.org/jira/browse/HIVE-25396
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline


Hive's org.apache.hive.service.thrift.ThriftHttpServlet.doPost method does not 
handle all Exception kinds. This leaves uncaught Exception handling choices to 
the Jetty HTTP library. We fix that.

Also, a Thread.UncaughtExceptionHandler is added to Hive Server 2 so uncaught 
Exception are handled uniformly, including making them logged and not just 
printed to stderr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25385) Prevent Hive Server 2 process failures when InterruptedException encountered

2021-07-25 Thread Matt McCline (Jira)

Matt McCline created HIVE-25385:
---

 Summary: Prevent Hive Server 2 process failures when 
InterruptedException encountered
 Key: HIVE-25385
 URL: https://issues.apache.org/jira/browse/HIVE-25385
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline


To prevent Hive Server 2 process failure, wrap InterruptedException with 
another Exception like MetaException, HiveSQLException, etc. Otherwise, 
InterruptedException rises to Thread.run and kills the process.

Example of problem stack trace:

java.lang.reflect.UndeclaredThrowableExceptionjava.lang.reflect.UndeclaredThrowableException
 at com.sun.proxy.$Proxy44.heartbeat(Unknown Source) at 
sun.reflect.GeneratedMethodAccessor127.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2990)
 at com.sun.proxy.$Proxy44.heartbeat(Unknown Source) at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:622) 
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:999)
 at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:998)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.InterruptedException: sleep interrupted at 
java.lang.Thread.sleep(Native Method) at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:259)
 ... 19 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25307) Hive Server 2 crashes when Thrift library encounters particular security protocol issue

2021-07-05 Thread Matt McCline (Jira)

Matt McCline created HIVE-25307:
---

 Summary: Hive Server 2 crashes when Thrift library encounters 
particular security protocol issue
 Key: HIVE-25307
 URL: https://issues.apache.org/jira/browse/HIVE-25307
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline


A RuntimeException is thrown by the Thrift library that causes Hive Server 2 to 
crash on our customer's machine. If you Google this the exception has been 
reported a couple of times over the years but not fixed. A blog (see references 
below) says it is an occasional security protocol issue between Hive Server 2 
and a proxy like a Gateway.

One challenge is the Thrift TTransportFactory getTransport method declaration 
throws no Exceptions hence the likely choice of RuntimeException. But that 
Exception is fatal to Hive Server 2.

The proposed fix is a work around that catches RuntimeException in Hive Server 
2, saves the Exception cause in a dummy TTransport object, and throws the cause 
when TTransport's open method is called later.

 

ExceptionClassName:
 java.lang.RuntimeException
 ExceptionStackTrace:
 java.lang.RuntimeException: 
org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in 
the stream

  at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)

  at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:694)

  at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:691)

  at java.security.AccessController.doPrivileged(Native Method)

  at javax.security.auth.Subject.doAs(Subject.java:360)

  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)

  at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:691)

  at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)

  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

  at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no 
sasl data in the stream

  at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:326)

  at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)

  at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)

  ... 10 more

 

References:

[Hive server 2 thrift error - Cloudera Community - 
34293|https://community.cloudera.com/t5/Support-Questions/Hive-server-2-thrift-error/td-p/34293]

Eric Lin blog "“NO DATA OR NO SASL DATA IN THE STREAM” ERROR IN HIVESERVER2 LOG"

[HIVE-12754] AuthTypes.NONE cause exception after HS2 start - ASF JIRA 
(apache.org)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant

2021-06-10 Thread Matt McCline (Jira)

Matt McCline created HIVE-25237:
---

 Summary: Thrift CLI Service Protocol: Enhance HTTP variant
 Key: HIVE-25237
 URL: https://issues.apache.org/jira/browse/HIVE-25237
 Project: Hive
  Issue Type: Improvement
Reporter: Matt McCline
Assignee: Matt McCline


I have been thinking about the (Thrift) CLI Service protocol between the client 
and server.

Cloudera's Prashanth Jayachandran (private e-mail) told me that its original 
BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP 
transport. HTTP is used when we go through a Gateway. The design for HTTP is 
stateless and different in nature than the direct BINARY TCP/IP connection. 
Which means today when we see that a Hive Server 2 response to a HTTP query 
request can be lost and that is part of the design... It is the WARNING we have 
seen when the Gateway drops its HTTP connection to Hive Server 2. We had been 
thinking this was a bug but it is by design.

I think the HTTP design needs a rethink.

When I worked for Tandem computers a long time ago messages were 
fault-tolerant. They used a message sequence #. When you send a message to a 
Tandem server it is a process pair. The message gets routed to the current 
process called the primary. The primary computes the message work and tells the 
backup process to remember the results before replying in case there is a 
failure. You can see where this goes -- if there is a failure before the client 
gets the result it retries and the backup process can resiliently give back the 
result the primary sent it. This isn't unique to Tandem -- without a 
process-pair -- this is a general resilient protocol.

In the HTTP design says message lost is possible both directions (request and 
response). I think we adopt a better scheme but not necessarily a process pair.

The first principle of rethink is the +_client_+ needs to generate a new 
operation num (an integer) that replaces the server-side generated random GUID. 
And the client generates a new msg num within its new operation. So beeline 
might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 1. If the 
client gets an OS connection kind of error, it retries with those (57, 1) 
numbers. Hive Server 2 will remember the last response. When Hive Server 2 gets 
a message, there are 3 cases:

1) The sessionId GUID is not valid -- for now we reject the request because it 
is likely Hive Server 2 killed the session perhaps because it was restarted.

2) The operationNum or operationMsgNum is new. (Assert the msg num increases 
monotonically.) Perform the request and save the response. And respond.

3) The (operationNum, operationMsgNum) matches the last request. Resiliently 
respond with the saved result.

I think this message handling is in alignment with the HTTP stateless and any 
messages in-between can be lost philosophy. And it will shield the client from 
suffering a whole category of message failures that unnecessarily kill queries.

This also allows to not worry about which request is idempotent or not but 
instead requests are resilient.

-

Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for 
idempotent and unsent http methods by prasanthj · Pull Request #1983 · 
apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25228) Thrift CLI Service Protocol: Watch for lack of interest by client and kill queries faster

2021-06-09 Thread Matt McCline (Jira)

Matt McCline created HIVE-25228:
---

 Summary: Thrift CLI Service Protocol: Watch for lack of interest 
by client and kill queries faster
 Key: HIVE-25228
 URL: https://issues.apache.org/jira/browse/HIVE-25228
 Project: Hive
  Issue Type: Improvement
Reporter: Matt McCline
Assignee: Matt McCline


CONSIDER: Have Hive Server 2 monitor operations (queries) for continuing client 
interest. If a client does not ask for status every 15 seconds, then 
automatically kill a query and release its txn locks and job resources.

 

Users will experience queries cleaning up much faster (15 to 30 seconds instead 
of minutes and possibly many minutes) when client communication is lost. 
Cleaning up those queries prevents other queries from being blocked on 
EXCLUSIVE txn locks and blocking of scheduling of their queries including 
retries of the original query. Today, users can get timeouts when they retry a 
query that got a connection error causing understandably upset users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25227) Thrift CLI Service Protocol: Eliminate long compile requests than proxies can timeout

2021-06-09 Thread Matt McCline (Jira)

Matt McCline created HIVE-25227:
---

 Summary: Thrift CLI Service Protocol: Eliminate long compile 
requests than proxies can timeout
 Key: HIVE-25227
 URL: https://issues.apache.org/jira/browse/HIVE-25227
 Project: Hive
  Issue Type: Improvement
Reporter: Matt McCline
Assignee: Matt McCline


CONSIDER: Avoid proxy (GW) timeouts on long Hive query compiles. Use request to 
start the operation; then poll for status like we do for execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25196) Native Vectorization of GenericUDFSplit function

2021-06-03 Thread Matt McCline (Jira)

Matt McCline created HIVE-25196:
---

 Summary: Native Vectorization of GenericUDFSplit function
 Key: HIVE-25196
 URL: https://issues.apache.org/jira/browse/HIVE-25196
 Project: Hive
  Issue Type: Improvement
Reporter: Matt McCline
Assignee: Matt McCline


Provide faster 'split' function for vector-mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25191) Modernize Hive Thrift CLI Service Protocol

2021-06-02 Thread Matt McCline (Jira)

Matt McCline created HIVE-25191:
---

 Summary: Modernize Hive Thrift CLI Service Protocol
 Key: HIVE-25191
 URL: https://issues.apache.org/jira/browse/HIVE-25191
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline


Unnecessary errors are occurring with the advent of proxy use such as Gateways 
between the Hive client and Hive Server 2. Query failures can be due to 
arbitrary proxy timeouts. This proposal avoids the timeouts by changing the 
protocol to do regular polling. Currently, the Hive client uses one request for 
the query compile request. Long query compile times make those requests 
vulnerable to the arbitrary proxy timeouts.

Another issue is Hive Server 2 sometimes does not notice the client has failed 
or has lost interest in a potentially long running query. This causes Hive 
locks and Big Data query resources to be held unnecessarily. The assumption is 
the client issues a cancel query request when it gets an error. This assumption 
does not always hold. If the proxy returned an error itself, that proxy may 
reject the subsequent cancel request, too. And, if the client is killed or the 
network is down, the client cannot complete a cancel request. The proposed 
solution here is for Hive Server 2 to watch that the client is sending regular 
polling requests for status. If a client ceases those requests, then Hive 
Server 2 will cancel the query.

Hive owns the JDBC path (i.e. HiveDriver). The ODBC path may be more 
challenging because vendors provide ODBC drivers and Hive does not own the ODBC 
protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-05-19 Thread Matt McCline (Jira)

Matt McCline created HIVE-25140:
---

 Summary: Hive Distributed Tracing -- Part 1: Disabled
 Key: HIVE-25140
 URL: https://issues.apache.org/jira/browse/HIVE-25140
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline


Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to Thrift 
and protobuf version conflicts.

Has Spans for BeeLine and Hive. Server 2. The code was developed on branch-3.1 
and porting Spans to the Hive MetaStore on master is taking more time due to 
major code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25069) Hive Distributed Tracing

2021-04-28 Thread Matt McCline (Jira)

Matt McCline created HIVE-25069:
---

 Summary: Hive Distributed Tracing
 Key: HIVE-25069
 URL: https://issues.apache.org/jira/browse/HIVE-25069
 Project: Hive
  Issue Type: New Feature
Reporter: Matt McCline


Instrument Hive code to gather distributed traces and export trace data to a 
configurable collector.

Distributed tracing is a revolutionary tool for debugging issues.

We will use new OpenTelemetry open-source standard that our industry has 
aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
projects OpenTracing and OpenCensus.

Next step: Add design document that goes into distributed tracing in more 
detail and describes how Hive will enhanced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

RE: [EXTERNAL] Re: Hive meetup on March 17

2021-03-18 Thread Matt McCline

Yes, thank you Zoltan! I learned a lot, too. I see lots of potential in more 
meetings.

Not all of my team could attend -- please publish the recording.

-Original Message-
From: Stamatis Zampetakis  
Sent: Thursday, March 18, 2021 1:16 AM
To: dev 
Subject: [EXTERNAL] Re: Hive meetup on March 17

Thanks for organising this Zoltan, and many thanks to all the speakers for the 
nice presentations.

I certainly learned some new stuff for the project, looking forward to the next 
one.

Best,
Stamatis

On Wed, Mar 17, 2021 at 4:05 PM Zoltan Haindrich  wrote:

> Hey All!
>
> We have our first online Hive meetup today!
>
> We will start at 5pm UTC for other timezones see on this site:
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> timeanddate.com%2Fworldclock%2Fmeetingdetails.html%3Fyear%3D2021%26mon
> th%3D3%26day%3D17%26hour%3D17%26min%3D0%26sec%3D0%26p1%3D50%26p2%3D137
> %26p3%3D136%26p4%3D70%26p5%3D176data=04%7C01%7Cmatt.mccline%40mic
> rosoft.com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141af91ab2d
> 7cd011db47%7C1%7C0%7C637516521984090461%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
> p;sdata=m5hEvDc9g%2FCrTyt6DY5JDPfcrrjCsieIFtLqVAqB%2Bbg%3Dreserve
> d=0
>
> If you don't yet have the meeting url - it will be held in a zoom room at:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fclou
> dera.zoom.us%2Fj%2F91452267238data=04%7C01%7Cmatt.mccline%40micro
> soft.com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C637516521984090461%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
> sdata=m6nuxdXVXbdJKp187s8FesPug2Hsi51osP82GuMGGPI%3Dreserved=0
> Most likely there will be a recording of it - which will be shared 
> afterwards.
>
> I was thinking to use Github discussions to (also) ask questions 
> during the event - because it could help untangle "question time" from 
> "answer time"; we may of course choose not to use it - but I've 
> experimented with it and if we add discussions to the "Q" section we 
> may even answer it - and people thinking about the same thing may 
> extend the question by adding further comments...or just vote on the 
> question...
> not sure how well it will work - might worth a try!
> I've set it up on my own fork for now:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fkgyrtkirk%2Fhive%2Fdiscussionsdata=04%7C01%7Cmatt.mcclin
> e%40microsoft.com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141a
> f91ab2d7cd011db47%7C1%7C0%7C637516521984100458%7CUnknown%7CTWFpbGZsb3d
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> 1000sdata=RRFY4ujn69%2FTiGtmbw936NsnRrsVIgwBhRK4%2FxmEFpo%3D
> reserved=0
>
> The meetup url is here:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> meetup.com%2FHive-User-Group-Meeting%2Fevents%2F276886707data=04%
> 7C01%7Cmatt.mccline%40microsoft.com%7Cd80a11344dea49e60da408d8e9e62526
> %7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637516521984100458%7CUnk
> nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw
> iLCJXVCI6Mn0%3D%7C1000sdata=QYJ6dj4%2F9SxWhopph2BvWMh1ngoEnQ4DelF
> CtVj7c6M%3Dreserved=0
>
> Meet you there!
>
> cheers,
> Zoltan
>
> On 3/16/21 3:29 PM, Zoltan Haindrich wrote:
> > Hey All!
> >
> > Our meetup is also available as a meetup.com event:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> > w.meetup.com%2FHive-User-Group-Meeting%2Fevents%2F276886707%2Fd
> > ata=04%7C01%7Cmatt.mccline%40microsoft.com%7Cd80a11344dea49e60da408d
> > 8e9e62526%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6375165219841
> > 00458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC
> > JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=myaJ8nZKBNJ0QDcwukBnYi
> > xxqvalB8trd3BMM%2Bvsn30%3Dreserved=0
> >
> > In case you want to add it to the calendar or something... :)
> >
> > cheers,
> > Zoltan
> >
> >
> > On 3/11/21 3:00 PM, Zoltan Haindrich wrote:
> >> Hey All!
> >>
> >> I would like to invite you to our (first?) online Hive meetup! It 
> >> will
> be held on March 17. 17:00 UTC
> >> I'll send out a zoom url before the event starts!
> >>
> >> The planned topics are accessible here:
> >>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> .google.com%2Fdocument%2Fd%2F12jaWa7e6jvVjUaxoMWNJcjvTjnNoqwdCAMyswY1O
> iUg%2Fedit%3Fusp%3Dsharingdata=04%7C01%7Cmatt.mccline%40microsoft
> .com%7Cd80a11344dea49e60da408d8e9e62526%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C637516521984100458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdat
> a=gpN63jFVNE%2BUAI2pf0DGjIg8ofdlT08NH1yta9giIWg%3Dreserved=0
> >>
> >> Meet you there!
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >>
> >>
>

RE: [EXTERNAL] Re: Any plan for new hive 3 or 4 release?

2021-02-27 Thread Matt McCline

Yes to Hive 4 release. Plenty of changes (1,500+).
Yes to regular release cadence (e.g. 3 month).

-Original Message-
From: Edward Capriolo  
Sent: Saturday, February 27, 2021 12:16 PM
To: Michel Sumbul 
Cc: dev@hive.apache.org; u...@hive.apache.org
Subject: [EXTERNAL] Re: Any plan for new hive 3 or 4 release?

The challenge is the venders. They almost always want to tie a release to some 
offering of there's.

Healthy software is released all the time. Just ship it.

Call a vote and propose a release. I'll +1 it if the tests pass!


On Friday, February 26, 2021, Michel Sumbul  wrote:

> It will be amazing if the community could produce a release every 
> quarter/6months. :-)
>
> Le ven. 26 févr. 2021 à 14:30, Edward Capriolo  
> a écrit :
>
>> Hive was releasable trunk for the longest time. Facebook days. Then 
>> the big data vendors got more involved. Then it became a pissing 
>> match about features. This vendor likes tez this vendor dont, this 
>> vendor likes hive on spark this one dont.
>>
>> Then this vendor wants to tell everyone hive stinks use impala. Then 
>> this vendor aquired that vendor..
>>
>> The best thing for hive is to have one branch master and do quarterly 
>> releases.
>>
>>
>>
>> On Friday, February 26, 2021, Peter Vary 
>> wrote:
>>
>>> Hi Lee,
>>>
>>> When I started to work on Hive around 4 years ago, MR was already 
>>> set as deprecated. So you definitely should scan even older archives.
>>>
>>> For Iceberg integration, it would be good to have more frequent 
>>> releases for Hive as well.
>>>
>>> Thanks, Peter
>>>
>>>
>>>
>>> Lee Ming-Ta  ezt írta (időpont: 2021. febr. 
>>> 24., Sze
>>> 4:34):
>>>
>>> > Dear all,
>>> >
>>> > I probably didn't follow that much and would like to ask if anyone 
>>> > can point me to some resources about the reason to remove MR?
>>> > Or what kine of keyword to search on Google?
>>> >
>>> > Thank you very much! Wish everyone a happy Lunar New Year.
>>> >
>>> > --
>>> > *寄件者:* Mass Dosage 
>>> > *寄件日期:* 2021年2月23日 下午 09:49
>>> > *收件者:* dev@hive.apache.org 
>>> > *副本:* Michel Sumbul ; u...@hive.apache.org 
>>> > < u...@hive.apache.org>
>>> > *主旨:* Re: Any plan for new hive 3 or 4 release?
>>> >
>>> > I would love to see a HIve 3.1 release which is capable of being 
>>> > used
>>> on
>>> > Java 11 like Hive 2 is.
>>> >
>>> > What is the main difference going to be between Hive 3 and 4? The
>>> removal
>>> > of MR?
>>> >
>>> > On Mon, 22 Feb 2021 at 16:46, Zoltan Haindrich  wrote:
>>> >
>>> > Hey Michel!
>>> >
>>> > Yes it was a long time ago we had a release; we have quite a few 
>>> > new features in master.
>>> > I think we are scaring people for some time now that we will be
>>> dropping
>>> > MR support...I think we should do that.
>>> >
>>> > I would really like to see a new Hive release in the near future 
>>> > as
>>> well -
>>> > there is no way for users to even try out new features.
>>> > I was planning to add nightly builds to package the latest 
>>> > master's
>>> state
>>> > into a deployable artifact - I think a service like may help 
>>> > pretest
>>> our
>>> > next release; I think it
>>> > won't take much to do it so I'll probably throw it together in the 
>>> > next couple days!
>>> >
>>> > cheers,
>>> > Zoltan
>>> >
>>> > On 2/21/21 2:27 PM, Michel Sumbul wrote:
>>> > > Hi Guys,
>>> > >
>>> > > If I'm not wrong, the last release of Hive 3.x is 18 months old.
>>> > > I wanted to ask if you had any roadmap / plan to release a new
>>> version of
>>> > > Hive 3.x or Hive 4?
>>> > >
>>> > > Thanks,
>>> > > Michel
>>> > >
>>> >
>>> >
>>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check 
>> than usual.
>>
>

--
Sorry this was sent from mobile. Will do less grammar and spell check than 
usual.

RE: [EXTERNAL] Hive meetup

2021-02-22 Thread Matt McCline

Definitely interested.

-Original Message-
From: Zoltan Haindrich  
Sent: Monday, February 22, 2021 10:17 AM
To: dev@hive.apache.org
Subject: [EXTERNAL] Hive meetup

Hey All!

It was quite some time ago when we had a meetup - and in these covid times it 
would be online-only anyway :) We were mentioning this lately here and there at 
Cloudera.
I think we could have a few talks spanning 2-3 hours or so.

Are there any interest in it?

I would be happy to talk about how hive-test-kube works and how hive-dev-box is 
employed during testing.

cheers,
Zoltan

Re: [DISCUSS] Hive 3.2

2020-11-13 Thread Matt McCline

A few comments. I am going to move forward with a VOTE on Hive 3.2 next.

-Original Message-
From: Matt McCline 
Sent: Monday, October 26, 2020 2:12 PM
To: dev@hive.apache.org
Subject: RE: [EXTERNAL] Re: [DISCUSS] Hive 3.2


Hi László,

Thank you for your response.

Since 3.1.3-rc0 was tagged on Jan 13 there are 3156 commits in master more than 
in this tag. I mostly wanted to address the huge number of changes in master.
We could do a 3.1.3 release with a modest number of changes, and a 3.2 with 
perhaps many or all of the 3,000+ changes in master.
What do you think?

Matt

-Original Message-
From: László Bodor 
Sent: Monday, October 26, 2020 4:19 AM
To: dev@hive.apache.org
Subject: [EXTERNAL] Re: [DISCUSS] Hive 3.2

Sorry, posted incorrect link for 3.1.3-rc0, the correct is:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Freleases%2Ftag%2Frelease-3.1.3-rc0data=04%7C01%7Cmatt.mccline%40microsoft.com%7Ceaddbcb684604ac8867808d879a0ff93%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637393079672104819%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=CXYiz9oONt6%2FpsGhAXso30qgjr3JQljyTKnxdE3XSvI%3Dreserved=0

On Mon, 26 Oct 2020 at 12:17, László Bodor 
wrote:

> Hey!
>
> I'm also interested in PMCs' opinion. I think it should be released 
> from branch-3, otherwise, it's a 4.0, right? (which is a heavier 
> discussion, and I don't know what Hive4 will be about.) On 3.x we have 
> an official 3.1.2 and an abandoned 3.1.3-rc0, which is not yet 
> released as far as I can see. I guess the next release is supposed to 
> be 3.1.3 as we haven't changed tez/hadoop/orc dependencies since that, 
> and I don't think branch-3 was actively maintained.
>
> Regards,
> Laszlo Bodor
>
> On Thu, 22 Oct 2020 at 21:24, Matt McCline 
>  wrote:
>
>> Hey,
>> Hive master is about 2 years ahead of 3.1 - it seems like time to 
>> release those changes.
>> So, let us have community discussion about creating a Hive 3.2 release.
>> I volunteer to be the release manager. I have not done that before, 
>> so I will need help.
>> I will start a VOTE thread soon, but I would like to hear some 
>> opinions first.
>>
>> Thank you,
>> Matt
>>
>> (It is unclear if there are enough major features or dependencies on 
>> projects that necessitate a major version bump)
>>
>>

RE: [EXTERNAL] Re: [DISCUSS] Hive 3.2

2020-10-26 Thread Matt McCline


Hi László,

Thank you for your response.

Since 3.1.3-rc0 was tagged on Jan 13 there are 3156 commits in master more than 
in this tag. I mostly wanted to address the huge number of changes in master.
We could do a 3.1.3 release with a modest number of changes, and a 3.2 with 
perhaps many or all of the 3,000+ changes in master.
What do you think?

Matt

-Original Message-
From: László Bodor  
Sent: Monday, October 26, 2020 4:19 AM
To: dev@hive.apache.org
Subject: [EXTERNAL] Re: [DISCUSS] Hive 3.2

Sorry, posted incorrect link for 3.1.3-rc0, the correct is:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Freleases%2Ftag%2Frelease-3.1.3-rc0data=04%7C01%7Cmatt.mccline%40microsoft.com%7Ceaddbcb684604ac8867808d879a0ff93%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637393079672104819%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=CXYiz9oONt6%2FpsGhAXso30qgjr3JQljyTKnxdE3XSvI%3Dreserved=0

On Mon, 26 Oct 2020 at 12:17, László Bodor 
wrote:

> Hey!
>
> I'm also interested in PMCs' opinion. I think it should be released 
> from branch-3, otherwise, it's a 4.0, right? (which is a heavier 
> discussion, and I don't know what Hive4 will be about.) On 3.x we have 
> an official 3.1.2 and an abandoned 3.1.3-rc0, which is not yet 
> released as far as I can see. I guess the next release is supposed to 
> be 3.1.3 as we haven't changed tez/hadoop/orc dependencies since that, 
> and I don't think branch-3 was actively maintained.
>
> Regards,
> Laszlo Bodor
>
> On Thu, 22 Oct 2020 at 21:24, Matt McCline 
>  wrote:
>
>> Hey,
>> Hive master is about 2 years ahead of 3.1 - it seems like time to 
>> release those changes.
>> So, let us have community discussion about creating a Hive 3.2 release.
>> I volunteer to be the release manager. I have not done that before, 
>> so I will need help.
>> I will start a VOTE thread soon, but I would like to hear some 
>> opinions first.
>>
>> Thank you,
>> Matt
>>
>> (It is unclear if there are enough major features or dependencies on 
>> projects that necessitate a major version bump)
>>
>>

[DISCUSS] Hive 3.2

2020-10-22 Thread Matt McCline

Hey,
Hive master is about 2 years ahead of 3.1 - it seems like time to release those 
changes.
So, let us have community discussion about creating a Hive 3.2 release.
I volunteer to be the release manager. I have not done that before, so I will 
need help.
I will start a VOTE thread soon, but I would like to hear some opinions first.

Thank you,
Matt

(It is unclear if there are enough major features or dependencies on projects 
that necessitate a major version bump)

[jira] [Created] (HIVE-20705) Vectorization: Native Vector MapJoin doesn't support Complex Big Table values

2018-10-05 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20705:
---

 Summary: Vectorization: Native Vector MapJoin doesn't support 
Complex Big Table values
 Key: HIVE-20705
 URL: https://issues.apache.org/jira/browse/HIVE-20705
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20645) Vectorization: Implicit casting causes scratch vector reuse Wrong Results

2018-09-27 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20645:
---

 Summary: Vectorization: Implicit casting causes scratch vector 
reuse Wrong Results
 Key: HIVE-20645
 URL: https://issues.apache.org/jira/browse/HIVE-20645
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


The bug fix in HIVE-20563 exposes a Wrong Results bug in vectorized_cast.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20524) Schema Evolution checking is broken in 3.0 for CHAR/VARCHAR

2018-09-09 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20524:
---

 Summary: Schema Evolution checking is broken in 3.0 for 
CHAR/VARCHAR
 Key: HIVE-20524
 URL: https://issues.apache.org/jira/browse/HIVE-20524
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


The new org.apache.hadoop.hive.metastore.ColumnType class under hive version 3 
hive-standalone-metadata-server method checkColTypeChangeCompatible lost a 
version 2 series bug fix that drops CHAR/VARCHAR (and DECIMAL I think) type 
decorations when checking for Schema Evolution compatibility.

Hive1 version 2 did undecoratedTypeName(oldType) and Hive2 version performed 
the logic in TypeInfoUtils.implicitConvertible on the PrimitiveCategory not the 
raw type string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20513) Vectorization: Improve Fast Vector MapJoin Bytes Hash Tables

2018-09-06 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20513:
---

 Summary: Vectorization: Improve Fast Vector MapJoin Bytes Hash 
Tables
 Key: HIVE-20513
 URL: https://issues.apache.org/jira/browse/HIVE-20513
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


 Based on HIVE-20491 discussions, improve Fast Vector MapJoin Bytes Hash Tables 
by only storing a one word slot entry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 68648: HIVE-20510

2018-09-06 Thread Matt McCline


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68648/#review208396
---




ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
Lines 865 (patched)
<https://reviews.apache.org/r/68648/#comment292287>

In order for EXPLAIN VECTORIZATION to see the proper information on 
BucketNumExpression you need to call 

ve.setInputTypeInfos(inputTypeInfo);
ve.setOutputTypeInfo(outputTypeInfo);

on the new VectorExpression.

Probably in a separate method.


- Matt McCline


On Sept. 6, 2018, 6:47 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68648/
> ---
> 
> (Updated Sept. 6, 2018, 6:47 a.m.)
> 
> 
> Review request for hive, Gopal V and Matt McCline.
> 
> 
> Bugs: HIVE-20510
> https://issues.apache.org/jira/browse/HIVE-20510
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Vectorization : Support loading bucketed tables using sorted dynamic 
> partition optimizer.
> Added a new VectorExpression BucketNumberExpression to evaluate 
> _bucket_number.
> Made the loops as tight as possible.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> 57f7c0108e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  5ab59c9c61 
>   ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q 
> 435cdaddd0 
>   
> ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 
> 22f0a31eb3 
> 
> 
> Diff: https://reviews.apache.org/r/68648/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>

[jira] [Created] (HIVE-20496) Vectorization: Vectorized PTF IllegalStateException

2018-09-02 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20496:
---

 Summary: Vectorization: Vectorized PTF IllegalStateException
 Key: HIVE-20496
 URL: https://issues.apache.org/jira/browse/HIVE-20496
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Testing rebased HIVE-18909 revealed this stack trace:

{code}
java.lang.IllegalStateException: null
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:159) 
~[guava-19.0.jar:?]
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFEvaluatorStreamingDoubleSum.evaluateGroupBatch(VectorPTFEvaluatorStreamingDoubleSum.java:51)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFGroupBatches.evaluateStreamingGroupBatch(VectorPTFGroupBatches.java:165)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.process(VectorPTFOperator.java:380)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:480)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:387)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20370) Vectorization: Add Native Vector MapJoin hash table optimization for Left/Right Outer Joins when there are no Small Table values

2018-08-12 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20370:
---

 Summary: Vectorization: Add Native Vector MapJoin hash table 
optimization for Left/Right Outer Joins when there are no Small Table values
 Key: HIVE-20370
 URL: https://issues.apache.org/jira/browse/HIVE-20370
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Similar to Native Vector MapJoin's InnerBigOnly optimization that uses an 
efficient Hash Multi-Set with a counter instead of a Hash Map with an empty 
value, do the same for Outer joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20367) Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM

2018-08-11 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20367:
---

 Summary: Vectorization: Support streaming for PTF AVG, MAX, MIN, 
SUM
 Key: HIVE-20367
 URL: https://issues.apache.org/jira/browse/HIVE-20367
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Add support for vectorizing PTF AVG, MAX, MIN, SUM when:

{noformat}
ROWS PRECEDING(MAX)~CURRENT
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20352) Vectorization: Support grouping function

2018-08-09 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20352:
---

 Summary: Vectorization: Support grouping function
 Key: HIVE-20352
 URL: https://issues.apache.org/jira/browse/HIVE-20352
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Support native vectorization for grouping function (part of Grouping Sets) so 
we don't need to use VectorUDFAdaptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20339) Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized

2018-08-08 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20339:
---

 Summary: Vectorization: Lift unneeded restriction causing some PTF 
with RANK not to be vectorized
 Key: HIVE-20339
 URL: https://issues.apache.org/jira/browse/HIVE-20339
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Unnecessary: "PTF operator: More than 1 argument expression of aggregation 
function rank"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20328) Reenable: TestMiniDruidCliDriver

2018-08-06 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20328:
---

 Summary: Reenable: TestMiniDruidCliDriver
 Key: HIVE-20328
 URL: https://issues.apache.org/jira/browse/HIVE-20328
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: slim bouguerra


Reenable tests disabled in HIVE-20322.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20325) FlakyTest: TestMiniDruidCliDriver

2018-08-06 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20325:
---

 Summary: FlakyTest: TestMiniDruidCliDriver
 Key: HIVE-20325
 URL: https://issues.apache.org/jira/browse/HIVE-20325
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


TestMiniDruidCliDriver is failing intermittently a significant percentage of 
the time.

druid_timestamptz
druidmini_joins
druidmini_masking
druidmini_test1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20315) Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions

2018-08-04 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20315:
---

 Summary: Vectorization: Fix more NULL / Wrong Results issues and 
avoid unnecessary casts/conversions
 Key: HIVE-20315
 URL: https://issues.apache.org/jira/browse/HIVE-20315
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Generate multi-byte Unicode characters in addition to regular single byte 
characters for random data.
Don't CAST from STRING/VARCHAR/CHAR TO STRING since all are stored in 
vectorization without padding.
Fix vectorized BETWEEN expression work to avoid unnecessary CAST of DECIMAL 
constants.
Fix NULL / Wrong Results issues in VectorElt.
Change performance Q files to generate non-user EXPLAIN with VECTORIZATION 
display so unnecesary CAST / DECIMAL_64 conversions are visible.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20294) Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT

2018-08-02 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20294:
---

 Summary: Vectorization: Fix NULL / Wrong Results issues in 
COALESCE / ELT
 Key: HIVE-20294
 URL: https://issues.apache.org/jira/browse/HIVE-20294
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Write new UT tests that use random data and intentional isRepeating batches to 
checks for NULL and Wrong Results for vectorized COALESCE and ELT.

Also, add tests for ARRAY and MAP indexing, IS [NOT] NULL and NOT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20245) Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN

2018-07-26 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20245:
---

 Summary: Vectorization: Fix NULL / Wrong Results issues in BETWEEN 
/ IN
 Key: HIVE-20245
 URL: https://issues.apache.org/jira/browse/HIVE-20245
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Write new UT tests that use random data and intentional isRepeating batches to 
checks for NULL and Wrong Results for vectorized BETWEEN and IN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20207) Vectorization: Fix NULL / Wrong Results issues in Filter / Compare

2018-07-18 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20207:
---

 Summary: Vectorization: Fix NULL / Wrong Results issues in Filter 
/ Compare
 Key: HIVE-20207
 URL: https://issues.apache.org/jira/browse/HIVE-20207
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Write new UT tests that use random data and intentional isRepeating batches to 
checks for NULL and Wrong Results for vectorized filter and compare.

BUGS:

1) LongColLessLongColumn SIMD optimization do not work for very large integers:
 -7272907770454997143 < 8976171455044006767
 outputVector[i] = (vector1[i] - vector2[i]) >>> 63;
 Produces 0 instead of 1...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20197) Vectorization: Add DECIMAL_64 testing, add Date/Interval/Timestamp arithmetic, and fix more NULL / Wrong Results issues in GROUP BY Aggregation Functions

2018-07-17 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20197:
---

 Summary: Vectorization: Add DECIMAL_64 testing, add 
Date/Interval/Timestamp arithmetic, and fix more NULL / Wrong Results issues in 
GROUP BY Aggregation Functions
 Key: HIVE-20197
 URL: https://issues.apache.org/jira/browse/HIVE-20197
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Add DECIMAL_64 testing to TestVectorArithmetic and TestVectorAggregation.

And, add a few more aggregation tests to TestVectorAggregation.

Add + and - Date/Interval/Timestamp arithmetic tests to TestVectorArithmetic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20174) Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions

2018-07-13 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20174:
---

 Summary: Vectorization: Fix NULL / Wrong Results issues in GROUP 
BY Aggregation Functions
 Key: HIVE-20174
 URL: https://issues.apache.org/jira/browse/HIVE-20174
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Write new UT tests that use random data and intentional isRepeating batches to 
checks for NULL and Wrong Results for vectorized aggregation functions:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20091) Tez: Add security credentials for FileSinkOperator output

2018-07-04 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20091:
---

 Summary: Tez: Add security credentials for FileSinkOperator output
 Key: HIVE-20091
 URL: https://issues.apache.org/jira/browse/HIVE-20091
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


DagUtils needs to add security credentials for the output for the 
FileSinkOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19992) Vectorization: Follow-on to HIVE-19951 --> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not

2018-06-25 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19992:
---

 Summary: Vectorization: Follow-on to HIVE-19951 --> add call to 
SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC 
only when data type conversion is not implicit
 Key: HIVE-19992
 URL: https://issues.apache.org/jira/browse/HIVE-19992
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


When ORC-380 that adds the SchemaEvolution.isOnlyImplicitConversion call is 
available in the ORC release used by Apache master (and branch-3), then update 
LlapRecordReader (see comments in HIVE-19951 change).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19951) Vectorization: Need to disable encoded LLAP I/O for ORC when there is data type conversion (Schema Evolution)

2018-06-20 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19951:
---

 Summary: Vectorization: Need to disable encoded LLAP I/O for ORC 
when there is data type conversion  (Schema Evolution)
 Key: HIVE-19951
 URL: https://issues.apache.org/jira/browse/HIVE-19951
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Currently, reading encoded ORC data does not support data type conversion.  So, 
encoded reading and cache populating needs to be disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19929) Vectorization: Recheck for vectorization wrong results/execution failures

2018-06-18 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19929:
---

 Summary: Vectorization: Recheck for vectorization wrong 
results/execution failures
 Key: HIVE-19929
 URL: https://issues.apache.org/jira/browse/HIVE-19929
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Use test variables hive.test.vectorized.execution.enabled.override=enable and 
hive.test.vectorization.suppress.explain.execution.mode=true to look for wrong 
results/execution failures when vectorization is forced ON and "Execution mode: 
vectorized" is suppressed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67329: HIVE-19629: Enable Decimal64 reader after orc version upgrade

2018-05-25 Thread Matt McCline


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67329/#review203919
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
Lines 4402 (patched)
<https://reviews.apache.org/r/67329/#comment286297>

NOTE TO SELF: Look at this again.


- Matt McCline


On May 25, 2018, 8:25 p.m., Prasanth_J wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67329/
> ---
> 
> (Updated May 25, 2018, 8:25 p.m.)
> 
> 
> Review request for hive and Matt McCline.
> 
> 
> Bugs: HIVE-19629
> https://issues.apache.org/jira/browse/HIVE-19629
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-19629: Enable Decimal64 reader after orc version upgrade
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 931533a 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 0af91bd 
>   itests/src/test/resources/testconfiguration.properties d146f92 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapInputFormat.java
>  6d29163 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/GenericColumnVectorProducer.java
>  7af1b05 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
>  feccb87 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
>  4033b37 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java
>  1cfe929 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/VectorDeserializeOrcWriter.java
>  de19b1d 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/ConsumerFileMetadata.java
>  bf139c0 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
>  0012afb 
>   pom.xml e48974b 
>   ql/pom.xml 06124f7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 2246901 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedInputFormatInterface.java
>  e74b185 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 
> 6588385 
>   ql/src/java/org/apache/hadoop/hive/ql/io/NullRowsInputFormat.java e632d43 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java f461364 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> 8c7c72e 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 7485e60 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java 1a6db1f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 5b001a0 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  d2e1a68 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcInputFormat.java 
> c581bba 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 71682af 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
>  646b214 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  ed6d577 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 394f826 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java af43b14 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
> fb2335a 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ef678a8 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java 
> d8a7af8 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java 1533ffa 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java 
> 0c9c95d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java
>  e478371 
>   ql/src/test/queries/clientpositive/llap_acid2.q a409c26 
>   ql/src/test/queries/clientpositive/llap_decimal64_reader.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/llap_uncompressed.q 875356c 
>   ql/src/test/results/clientpositive/acid_mapjoin.q.out 76a781e 
>   ql/src/test/results/clientpositive/acid_nullscan.q.out 6dad497 
>   ql/src/test/results/clientpositive/acid_table_stats.q.out 2596922 
>   ql/src/test/results/clientpositive/annotate_stats_part.q.out 9e45101 
>   ql/src/test/results/clientpositive/annotate_stats_table.q.out b502957 
>   ql/src/test/results/clientpositive/autoColumnStats_

[jira] [Created] (HIVE-19566) Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions

2018-05-15 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19566:
---

 Summary: Vectorization: Fix NULL / Wrong Results issues in Complex 
Type Functions
 Key: HIVE-19566
 URL: https://issues.apache.org/jira/browse/HIVE-19566
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
 Fix For: 3.1.0


Write new UT tests that use random data and intentional isRepeating batches to 
checks for NULL and Wrong Results for vectorized Complex Type functions:
 * index
 * (StructField)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19565) Vectorization: Fix NULL / Wrong Results issues in STRING Functions

2018-05-15 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19565:
---

 Summary: Vectorization: Fix NULL / Wrong Results issues in STRING 
Functions
 Key: HIVE-19565
 URL: https://issues.apache.org/jira/browse/HIVE-19565
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Write new UT tests that use random data and intentional isRepeating batches to 
checks for NULL and Wrong Results for vectorized STRING functions:
 * char_length
 * concat
 * initcap
 * length
 * lower
 * ltrim
 * octet_length
 * regexp
 * rtrim
 * trim
 * upper
 * UDF:
 ** hex
 ** like
 ** substr



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19564) Vectorization: Fix NULL / Wrong Results issues in Functions

2018-05-15 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19564:
---

 Summary: Vectorization: Fix NULL / Wrong Results issues in 
Functions
 Key: HIVE-19564
 URL: https://issues.apache.org/jira/browse/HIVE-19564
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Write new UT tests that use random data and intentional isRepeating batches to 
checks for NULL and Wrong Results for vectorized functions:
 * Generic UDF Functions
 ** abs
 ** bround
 ** ceiling
 ** floor
 ** pmod
 ** power
 ** round
 * UDF Functions
 ** Acos
 ** Asin
 ** Atan
 ** Bin
 ** Cos
 ** Degrees
 ** Exp
 ** Ln
 ** Log
 ** log10
 ** log2
 ** radians
 ** rand
 ** sign
 ** sin
 ** sqrt
 ** tan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19530) Vectorization: Fix JDBCSerde and re-enable vectorization

2018-05-14 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19530:
---

 Summary: Vectorization: Fix JDBCSerde and re-enable vectorization
 Key: HIVE-19530
 URL: https://issues.apache.org/jira/browse/HIVE-19530
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


According to [~jcamachorodriguez] there is a big switch statement in the code 
that has might have missing types. This can lead to the string types seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19529) Vectorization: Date/Timestamp NULL issues

2018-05-14 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19529:
---

 Summary: Vectorization: Date/Timestamp NULL issues
 Key: HIVE-19529
 URL: https://issues.apache.org/jira/browse/HIVE-19529
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


date_add/date_sub

more TBD



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19498) Vectorization: CAST expressions produce wrong results

2018-05-10 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19498:
---

 Summary: Vectorization: CAST expressions produce wrong results
 Key: HIVE-19498
 URL: https://issues.apache.org/jira/browse/HIVE-19498
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.1.0


DATE --> BOOLEAN
DOUBLE --> DECIMAL
STRING|CHAR|VARCHAR --> DECIMAL
TIMESTAMP --> LONG



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19448) Vectorization: sysdb test doesn't work after enabling vectorization by default

2018-05-07 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19448:
---

 Summary: Vectorization: sysdb test doesn't work after enabling 
vectorization by default
 Key: HIVE-19448
 URL: https://issues.apache.org/jira/browse/HIVE-19448
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


{noformat}
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
java.lang.Boolean
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaBooleanObjectInspector.getPrimitiveWritableObject(JavaBooleanObjectInspector.java:36)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:434)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:347)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:948){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19384) Vectorization: IfExprTimestampColumnScalarBase doesn't handle the arg1ColVector.noNulls case correctly

2018-05-02 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19384:
---

 Summary: Vectorization: IfExprTimestampColumnScalarBase doesn't 
handle the arg1ColVector.noNulls case correctly
 Key: HIVE-19384
 URL: https://issues.apache.org/jira/browse/HIVE-19384
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


It is missing boilerplate code from HIVE-18622: "Vectorization: IF Statements, 
Comparisons, and more do not handle NULLs correctly" fix.
{noformat}
// Carefully handle NULLs...

outputColVector.noNulls = false;{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19353) Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST

2018-04-28 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19353:
---

 Summary: Vectorization: ConstantVectorExpression  --> 
RuntimeException: Unexpected column vector type LIST
 Key: HIVE-19353
 URL: https://issues.apache.org/jira/browse/HIVE-19353
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Found by enabling vectorization for 
org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
{noformat}
Caused by: java.lang.RuntimeException: Unexpected column vector type LIST
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:237)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:984)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:722) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19352) Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData

2018-04-28 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19352:
---

 Summary: Vectorization: Disable vectorization for 
org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
 Key: HIVE-19352
 URL: https://issues.apache.org/jira/browse/HIVE-19352
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Turning vectorization on triggers a bug - see Jira .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19351) Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1

2018-04-28 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19351:
---

 Summary: Vectorization: Followup on why operator numbers are 
unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1
 Key: HIVE-19351
 URL: https://issues.apache.org/jira/browse/HIVE-19351
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Why were the operator numbers unstable for:

TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]

TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] 

when vectorization was enabled?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19350) Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1

2018-04-28 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19350:
---

 Summary: Vectorization: Turn off vectorization for explainuser_1.q 
/ spark_explainuser_1
 Key: HIVE-19350
 URL: https://issues.apache.org/jira/browse/HIVE-19350
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Seem like the operator number instability issue to me that Pengcheng Xiong that 
could occur with vectorization.

For now, turning off vectorization for:

TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]

TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] 

Follow up Jira is 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19275) Vectorization: Wrong Results / Execution Failures when Vectorization turned on in Spark

2018-04-23 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19275:
---

 Summary: Vectorization: Wrong Results / Execution Failures when 
Vectorization turned on in Spark
 Key: HIVE-19275
 URL: https://issues.apache.org/jira/browse/HIVE-19275
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0, 3.1.0


Quite a number of the bucket* tests had Wrong Results or Execution Failures.

And others like semijoin, skewjoin, avro_decimal_native, mapjoin_addjar, 
mapjoin_decimal, nullgroup, decimal_join, mapjoin1.

Some of the problems might be as simple as "-- SORT_QUERY_RESULTS" is missing.

The bucket* problems looked more serious.

This change sets "hive.vectorized.execution.enabled" to false at the top of 
those Q files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19269) Vectorization: Turn On by Default

2018-04-22 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19269:
---

 Summary: Vectorization: Turn On by Default
 Key: HIVE-19269
 URL: https://issues.apache.org/jira/browse/HIVE-19269
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0, 3.1.0


Reflect that our most expected Hive deployment will be using vectorization and 
change the default of hive.vectorized.execution.enabled to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19264) Vectorization: Reenable vectorization in vector_adaptor_usage_mode.q

2018-04-21 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19264:
---

 Summary: Vectorization: Reenable vectorization in 
vector_adaptor_usage_mode.q
 Key: HIVE-19264
 URL: https://issues.apache.org/jira/browse/HIVE-19264
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0, 3.1.0


[~vihangk1] observed vectorization had accidentally been turned off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-13 Thread Matt McCline


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201121
---




ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
Lines 338 (patched)
<https://reviews.apache.org/r/66567/#comment282106>

Logging per row too expensive to leave in.



ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
Line 338 (original), 344 (patched)
<https://reviews.apache.org/r/66567/#comment282107>

Unnecessary line.



ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
Lines 453 (patched)
<https://reviews.apache.org/r/66567/#comment282108>

Please add comments as to the significanse of checking the acidOp flag.


- Matt McCline


On April 12, 2018, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 12, 2018, 6:24 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Jason Dere, and Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
>  5dd0b8ea5b 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
>  7c2cadefa7 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
>  ad14c7265f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  3733e3d02f 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
>  03c28a33c8 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 353b890b7c 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  5966740f88 
>   itests/src/test/resources/testconfiguration.properties 48d62a8bf9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
>  4077552a56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/s

[jira] [Created] (HIVE-19200) Vectorization: Disable vectorization for LLAP I/O when a non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type conversion is needed

2018-04-13 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19200:
---

 Summary: Vectorization: Disable vectorization for LLAP I/O when a 
non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type 
conversion is needed
 Key: HIVE-19200
 URL: https://issues.apache.org/jira/browse/HIVE-19200
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Disable vectorization for issue in HIVE-18763 until we can do the harder VRB 
conversion code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19167) Map data type doesn't keep the order of the key/values pairs as read (Part 2, The Sequel or SQL)

2018-04-11 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19167:
---

 Summary: Map data type doesn't keep the order of the key/values 
pairs as read (Part 2, The Sequel or SQL)   
 Key: HIVE-19167
 URL: https://issues.apache.org/jira/browse/HIVE-19167
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.1.0


HIVE-19116: "Vectorization: Vector Map data type doesn't keep the order of the 
key/values pairs as read" didn't fix all the places where HashMap is used 
instead of LinkedHashMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19118) Vectorization: Turning on vectorization in escape_crlf produces wrong results

2018-04-05 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19118:
---

 Summary: Vectorization: Turning on vectorization in escape_crlf 
produces wrong results
 Key: HIVE-19118
 URL: https://issues.apache.org/jira/browse/HIVE-19118
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Found in vectorization enable by default experiment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19116) Vectorization: Vector Map data type doesn't keep the order of the key/values pairs as read

2018-04-05 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19116:
---

 Summary: Vectorization: Vector Map data type doesn't keep the 
order of the key/values pairs as read
 Key: HIVE-19116
 URL: https://issues.apache.org/jira/browse/HIVE-19116
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


The VectorExtractRow class does not preserve the order of the key/value pairs 
when going from MapColumnVector to a Map object.  This causes Q file 
differences in tests with the MAP data type making it seem like we are getting 
Wrong Results (well, actually we are).

When LazyMap class (for example) adds key/value pairs to its "map" it uses a 
LinkedHashSet to preserve insert order.

FYI: [~teddy.choi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19110) Vectorization: Enabling vectorization causes TestContribCliDriver udf_example_arraymapstruct.q to produce Wrong Results

2018-04-04 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19110:
---

 Summary: Vectorization: Enabling vectorization causes 
TestContribCliDriver udf_example_arraymapstruct.q to produce Wrong Results
 Key: HIVE-19110
 URL: https://issues.apache.org/jira/browse/HIVE-19110
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Found in vectorization enable by default experiment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19109) Vectorization: Enabling vectorization causes delete_orig_table to produce Wrong Results

2018-04-04 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19109:
---

 Summary: Vectorization: Enabling vectorization causes 
delete_orig_table to produce Wrong Results
 Key: HIVE-19109
 URL: https://issues.apache.org/jira/browse/HIVE-19109
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Found in vectorization enable by default experiment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19108) Vectorization and Parquet: Turning on vectorization in parquet_ppd_decimal.q causes Wrong Query Results

2018-04-04 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19108:
---

 Summary: Vectorization and Parquet: Turning on vectorization in 
parquet_ppd_decimal.q causes Wrong Query Results
 Key: HIVE-19108
 URL: https://issues.apache.org/jira/browse/HIVE-19108
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Found in vectorization enable by default experiment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19102) Vectorization: Suppress known Q file bugs

2018-04-04 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19102:
---

 Summary: Vectorization: Suppress known Q file bugs
 Key: HIVE-19102
 URL: https://issues.apache.org/jira/browse/HIVE-19102
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


There are known bugs recently found and reported that occur when vectorization 
is turn on in Q files.  Until those bugs are fixed, add SET statements to the 
top of the Q files that suppress vectorization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19088) Vectorization: Turning on vectorization in input_lazyserde.q causes ClassCastException

2018-03-31 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19088:
---

 Summary: Vectorization: Turning on vectorization in 
input_lazyserde.q causes ClassCastException
 Key: HIVE-19088
 URL: https://issues.apache.org/jira/browse/HIVE-19088
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


{noformat}
2018-03-31T21:19:48,252 ERROR [LocalJobRunner Map Task Executor #0] 
mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error while processing row

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:967)

  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154)

  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)

  at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)

  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)

  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

  at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)

  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

  at java.util.concurrent.FutureTask.run(FutureTask.java:266)

  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

  at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.StandardUnionObjectInspector$StandardUnion

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:608)

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581)

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581)

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581)

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRow(VectorAssignRow.java:998)

  at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:956)

  ... 11 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19074) Vectorization: Add llap vectorization_div0.q.out Q output file

2018-03-29 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19074:
---

 Summary: Vectorization: Add llap vectorization_div0.q.out Q output 
file
 Key: HIVE-19074
 URL: https://issues.apache.org/jira/browse/HIVE-19074
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline


At some point llap/vectorization_div0.q.out got omitted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19052) Vectorization: Disable Vector Pass-Thru MapJoin in the presence of old-style MR FilterMaps

2018-03-26 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19052:
---

 Summary: Vectorization: Disable Vector Pass-Thru MapJoin in the 
presence of old-style MR FilterMaps
 Key: HIVE-19052
 URL: https://issues.apache.org/jira/browse/HIVE-19052
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Pass-Thru VectorMapJoinOperator and VectorSMBMapJoinOperator were not designed 
to handle old-style MR FilterMaps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19045) Vectorization: Disable vectorization in non-vectorized Parquet Q files

2018-03-25 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19045:
---

 Summary: Vectorization: Disable vectorization in non-vectorized 
Parquet Q files
 Key: HIVE-19045
 URL: https://issues.apache.org/jira/browse/HIVE-19045
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


In preparation for turning vectorization on by default, explicitly turn off 
vectorization at the top of the Parquet Q files since there are a separate set 
of Parquet Vectorization Q files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19043) Vectorization: LazySimpleDeserializeRead fewer fields handling broken for Complex Types

2018-03-23 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19043:
---

 Summary: Vectorization: LazySimpleDeserializeRead fewer fields 
handling broken for Complex Types
 Key: HIVE-19043
 URL: https://issues.apache.org/jira/browse/HIVE-19043
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Issues were revealed by vectorizing create_struct_table.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19037) Vectorization: Miscellaneous cleanup

2018-03-23 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19037:
---

 Summary: Vectorization: Miscellaneous cleanup
 Key: HIVE-19037
 URL: https://issues.apache.org/jira/browse/HIVE-19037
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline



# Extraneous INFO logging in VectorReduceSinkCommonOperator
# NPE in EXPLAIN for some SelectColumnIsTrue vector expressions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19035) Vectorization: Disable exotic field reference form

2018-03-23 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19035:
---

 Summary: Vectorization: Disable exotic field reference form
 Key: HIVE-19035
 URL: https://issues.apache.org/jira/browse/HIVE-19035
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


We currently don't support exotic field references like get a struct field from 
array<struct> returns a type array.  Attempt causes 
ClassCastException in VectorizationContext that kills query planning.

The Q file is input_testxpath3.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19032) Vectorization: Disable GROUP BY aggregations with DISTINCT

2018-03-22 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19032:
---

 Summary: Vectorization: Disable GROUP BY aggregations with DISTINCT
 Key: HIVE-19032
 URL: https://issues.apache.org/jira/browse/HIVE-19032
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Vectorized GROUP BY does not support DISTINCT aggregation functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19024) Vectorization: Disable complex type constants

2018-03-22 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19024:
---

 Summary: Vectorization: Disable complex type constants
 Key: HIVE-19024
 URL: https://issues.apache.org/jira/browse/HIVE-19024
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Currently, complex type constants are not detected and cause execution failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19020) Vectorization: When vectorized, orc_null_check.q throws NPE in VectorExpressionWriterFactory

2018-03-21 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19020:
---

 Summary: Vectorization: When vectorized, orc_null_check.q throws 
NPE in VectorExpressionWriterFactory
 Key: HIVE-19020
 URL: https://issues.apache.org/jira/browse/HIVE-19020
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Adding "SET hive.vectorized.execution.enabled=true;" to orc_null_check.q 
triggers this call stack:

{noformat}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$18.setValue(VectorExpressionWriterFactory.java:1465)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$18.writeValue(VectorExpressionWriterFactory.java:1453)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:199)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:151)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:813)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:846)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?]
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19019) Vectorization and Parquet: When vectorized, parquet_schema_evolution.q throws HiveException "Not implemented yet"

2018-03-21 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19019:
---

 Summary: Vectorization and Parquet: When vectorized, 
parquet_schema_evolution.q throws HiveException "Not implemented yet"
 Key: HIVE-19019
 URL: https://issues.apache.org/jira/browse/HIVE-19019
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Adding "SET hive.vectorized.execution.enabled=true;" to 
parquet_schema_evolution.q triggers this call stack:

{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Not implemented yet
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$19.writeValue(VectorExpressionWriterFactory.java:1496)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:199)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:151)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.flushDeserializerBatch(VectorMapOperator.java:630)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setupPartitionContextVars(VectorMapOperator.java:698)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:607)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1210)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:829)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?]
{noformat}

The complex types in VectorExpressionWriterFactory are not fully implemented.

FYI: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19016) Vectorization and Parquet: When vectorized, parquet_nested_complex.q produces RuntimeException: Unsupported type used

2018-03-21 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19016:
---

 Summary: Vectorization and Parquet: When vectorized, 
parquet_nested_complex.q produces RuntimeException: Unsupported type used
 Key: HIVE-19016
 URL: https://issues.apache.org/jira/browse/HIVE-19016
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Adding "SET hive.vectorized.execution.enabled=true;" to 
parquet_nested_complex.q triggers this call stack:

{noformat}
Caused by: java.lang.RuntimeException: Unsupported type used in 
list:array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array>>>>>>>>>>>>>>>>>>>>>
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkListColumnSupport(VectorizedParquetRecordReader.java:589)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:525)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

FYI: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

2018-03-21 Thread Matt McCline (JIRA)

Matt McCline created HIVE-19015:
---

 Summary: Vectorization and Parquet: When vectorized, 
parquet_map_of_arrays_of_ints.q gets a ClassCastException
 Key: HIVE-19015
 URL: https://issues.apache.org/jira/browse/HIVE-19015
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


Adding "SET hive.vectorized.execution.enabled=true;"  to 
parquet_map_of_arrays_of_ints.q triggers this call stack:

{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to 
org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

FYI: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18995) Vectorization: Add option to suppress "Execution mode: vectorized" for testing purposes

2018-03-19 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18995:
---

 Summary: Vectorization: Add option to suppress "Execution mode: 
vectorized" for testing purposes
 Key: HIVE-18995
 URL: https://issues.apache.org/jira/browse/HIVE-18995
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


In order to see Q file differences in large runs it is helpful to eliminate 
change noise from "Execution mode: vectorized" in EXPLAIN output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18908) Add support for FULL OUTER JOIN to MapJoin

2018-03-08 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18908:
---

 Summary: Add support for FULL OUTER JOIN to MapJoin
 Key: HIVE-18908
 URL: https://issues.apache.org/jira/browse/HIVE-18908
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Currently, we do not support FULL OUTER JOIN in MapJoin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18819) Vectorization: Optimize IF statement expression evaluation of THEN/ELSE

2018-02-27 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18819:
---

 Summary: Vectorization: Optimize IF statement expression 
evaluation of THEN/ELSE
 Key: HIVE-18819
 URL: https://issues.apache.org/jira/browse/HIVE-18819
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Currently, all the rows of a batch are evaluated for the THEN and ELSE 
expressions even though only a value from one of them is needed for any 
particular row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18807) Fix broken test caused by HIVE-18493

2018-02-26 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18807:
---

 Summary: Fix broken test caused by HIVE-18493
 Key: HIVE-18807
 URL: https://issues.apache.org/jira/browse/HIVE-18807
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18806) Add @Ignore for broken test caused by HIVE-18493

2018-02-26 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18806:
---

 Summary: Add @Ignore for broken test caused by HIVE-18493
 Key: HIVE-18806
 URL: https://issues.apache.org/jira/browse/HIVE-18806
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18800) Vectorization: VectorCoalesce doesn't handle the all repeated NULLs case

2018-02-25 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18800:
---

 Summary: Vectorization: VectorCoalesce doesn't handle the all 
repeated NULLs case
 Key: HIVE-18800
 URL: https://issues.apache.org/jira/browse/HIVE-18800
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Fix for HIVE-18622 broken the case when all columns are repeated NULLs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18758) Vectorization: Fix VectorUDAFVarFinal produces Wrong Results

2018-02-20 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18758:
---

 Summary: Vectorization: Fix VectorUDAFVarFinal produces Wrong 
Results
 Key: HIVE-18758
 URL: https://issues.apache.org/jira/browse/HIVE-18758
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Fix and turn back on vectorization for issue found in 
https://issues.apache.org/jira/browse/HIVE-18756



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18756) Vectorization: VectorUDAFVarFinal produces Wrong Results

2018-02-20 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18756:
---

 Summary: Vectorization: VectorUDAFVarFinal produces Wrong Results
 Key: HIVE-18756
 URL: https://issues.apache.org/jira/browse/HIVE-18756
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


For a large query.  Disabling vectorization for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18744) Vectorization: VectorHashKeyWrapperBatch doesn't check repeated NULLs correctly

2018-02-19 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18744:
---

 Summary: Vectorization: VectorHashKeyWrapperBatch doesn't check 
repeated NULLs correctly
 Key: HIVE-18744
 URL: https://issues.apache.org/jira/browse/HIVE-18744
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Logic for checking selectedInUse isRepeating case for NULL is broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18722) Vectorization: Adding SUM(HASH(..)) to full query seems to produce flakey results -- need to investiage

2018-02-15 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18722:
---

 Summary: Vectorization: Adding SUM(HASH(..)) to full query seems 
to produce flakey results -- need to investiage
 Key: HIVE-18722
 URL: https://issues.apache.org/jira/browse/HIVE-18722
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


When added to HIVE-18622 changes, the query results vary from laptop results 
when run on Hive QA cluster.  Need to investigate after HIVE-18622 commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18622) Vectorization: IF statement, Comparisons, and more do not handle NULLs correctly

2018-02-05 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18622:
---

 Summary: Vectorization: IF statement, Comparisons, and more do not 
handle NULLs correctly
 Key: HIVE-18622
 URL: https://issues.apache.org/jira/browse/HIVE-18622
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0


 
Many vector expression classes are missing guards around setting noNulls among 
other things.

{code:java}
// Carefully update noNulls...
if (outputColVector.noNulls) {
  outputColVector.noNulls = inputColVector.noNulls;
}
 {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18600) Vectorization: Top-Level Vector Expression Scratch Column Deallocation

2018-02-01 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18600:
---

 Summary: Vectorization: Top-Level Vector Expression Scratch Column 
Deallocation
 Key: HIVE-18600
 URL: https://issues.apache.org/jira/browse/HIVE-18600
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0


The operators create various vector expression *arrays* for predicates, SELECT 
clauses, key expressions, etc.  We could have those be marked as special "top 
level" vector expression then we could defer deallocation until the top level 
expression is complete.  This could be a simple solution that avoids trying fix 
our current eager deallocation that tries to reuse scratch columns as soon as 
possible.  It *isn't optimal*, but it *shouldn't be too bad*. This solution is 
much better than not deallocating at all - especially for queries that SELECT a 
large number of columns or have a lot of expressions in the operator tree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18562) Vectorization: CHAR/VARCHAR conversion in VectorDeserializeRow is broken

2018-01-28 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18562:
---

 Summary: Vectorization: CHAR/VARCHAR conversion in 
VectorDeserializeRow is broken
 Key: HIVE-18562
 URL: https://issues.apache.org/jira/browse/HIVE-18562
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0


Altering a CHAR/VARCHAR column's maxLength to a shorter value does not truncate 
values when vectorized. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18561) Vectorization: Current vector PTF doesn't work under GroupBy and is designed for reduce-shuffle input

2018-01-27 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18561:
---

 Summary: Vectorization: Current vector PTF doesn't work under 
GroupBy and is designed for reduce-shuffle input
 Key: HIVE-18561
 URL: https://issues.apache.org/jira/browse/HIVE-18561
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Need to add validation check in Vectorizer that doesn't vectorize unless PTF is 
under reduce-shuffle (with optional SELECT in-between).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18551) Vectorization: VectorMapOperator tries to write too many vector columns for Hybrid Grace

2018-01-25 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18551:
---

 Summary: Vectorization: VectorMapOperator tries to write too many 
vector columns for Hybrid Grace
 Key: HIVE-18551
 URL: https://issues.apache.org/jira/browse/HIVE-18551
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0


Code incorrectly uses projectedColumns.length instead of singleRow.length



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18531) Vectorization: Vectorized PTF operator should not set the initial type infos

2018-01-24 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18531:
---

 Summary: Vectorization: Vectorized PTF operator should not set the 
initial type infos
 Key: HIVE-18531
 URL: https://issues.apache.org/jira/browse/HIVE-18531
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline


The Vectorized PTF operator is mistakenly setting the initial type infos for 
its output VectorizationContext.  It should not.  It is only creating a 
projection of the initial columns from ReduceSink (i.e. keys, values) plus 
scratch columns for output columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18524) Vectorization: Execution failure related to non-standard embedding of IfExprConditionalFilter inside VectorUDFAdaptor (HIVE-17139)

2018-01-24 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18524:
---

 Summary: Vectorization: Execution failure related to non-standard 
embedding of IfExprConditionalFilter inside VectorUDFAdaptor (HIVE-17139)
 Key: HIVE-18524
 URL: https://issues.apache.org/jira/browse/HIVE-18524
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline


{nocode}
insert overwrite table insert_10_1
select cast(gpa as float),
   age,
   IF(age>40,cast('2011-01-01 01:01:01' as timestamp),NULL),
   IF(LENGTH(name)>10,cast(name as binary),NULL)
from studentnull10k

vectorizationSchemaColumns: [0:name:string, 1:age:int, 2:gpa:double]

ExprNodeDescs:
UDFToFloat(gpa) (type: float),
age (type: int),
if((age > 40), 2011-01-01 01:01:01.0, null) (type: timestamp),
if((length(name) > 10), CAST( name AS BINARY), null) (type: binary)

selectExpressions:
VectorUDFAdaptor(if((age > 40), 2011-01-01 01:01:01.0, null))
(children: LongColGreaterLongScalar(col 1:int, val 40) -> 4:boolean) -> 
5:timestamp,
VectorUDFAdaptor(if((length(name) > 10), CAST( name AS BINARY), null))
(children: LongColGreaterLongScalar(col 4:int, val 10)(children: 
StringLength(col 0:string) -> 4:int) -> 6:boolean,
VectorUDFAdaptor(CAST( name AS BINARY)) -> 7:binary) -> 8:binary
{nocode}

*// Notice there is no vector expression shown for the last IF stmt.*  It has 
been magically embedded inside the VectorUDFAdaptor object...

Execution results in this call stack.
{nocode}
Caused by: java.lang.NullPointerException
at java.util.Arrays.copyOfRange(Arrays.java:3521)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$9.writeValue(VectorExpressionWriterFactory.java:1101)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterBytes.writeValue(VectorExpressionWriterFactory.java:343)
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFArgDesc.getDeferredJavaObject(VectorUDFArgDesc.java:123)
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:211)
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:177)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
... 22 more
{nocode}

Change is due to:
HIVE-17139: Conditional expressions optimization: skip the expression 
evaluation if the condition is not satisfied for vectorization engine. (Jia Ke, 
reviewed by Ferdinand Xu)

Embedding a raw vector expression outside of VectorizationContext is quite 
non-standard and evidently buggy.

[~Ferd] [~Ke Jia] I am inclined to revert this change.  Comments?  CC: 
[~ashutoshc] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18521) Vectorization: query failing in reducer VectorUDAFAvgDecimalPartial2 java.lang.ClassCastException StructTypeInfo --> DecimalTypeInfo

2018-01-23 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18521:
---

 Summary: Vectorization: query failing in reducer 
VectorUDAFAvgDecimalPartial2 java.lang.ClassCastException StructTypeInfo --> 
DecimalTypeInfo
 Key: HIVE-18521
 URL: https://issues.apache.org/jira/browse/HIVE-18521
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18517) Vectorization: Fix VectorMapOperator to accept VRBs to support LLAP Caching

2018-01-23 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18517:
---

 Summary: Vectorization: Fix VectorMapOperator to accept VRBs to 
support LLAP Caching
 Key: HIVE-18517
 URL: https://issues.apache.org/jira/browse/HIVE-18517
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


LLAP is able to deserialize and cache data from an input format (e.g. 
TextInputFormat) and will deliver that cached data to VectorMapOperator as VRBs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18493) Add display escape for CR/LF to Hive CLI and Beeline

2018-01-18 Thread Matt McCline (JIRA)

Matt McCline created HIVE-18493:
---

 Summary: Add display escape for CR/LF to Hive CLI and Beeline
 Key: HIVE-18493
 URL: https://issues.apache.org/jira/browse/HIVE-18493
 Project: Hive
  Issue Type: Bug
  Components: Beeline, Hive
Affects Versions: 3.0.0
Reporter: Matt McCline
Assignee: Matt McCline


Add optional display escaping of carriage return and line feed so row output 
remains one line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 6 7 8 9 >

1 - 100 of 891 matches

Mail list logo