[jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries

2015-08-03 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652450#comment-14652450
 ] 

Matt McCline commented on HIVE-11415:
-

[~jvaria] FYI.

 Add early termination for recursion in vectorization for deep filter queries
 

 Key: HIVE-11415
 URL: https://issues.apache.org/jira/browse/HIVE-11415
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran
Assignee: Matt McCline

 Queries with deep filters (left deep) throws StackOverflowException in 
 vectorization
 {code}
 Exception in thread main java.lang.StackOverflowError
   at java.lang.Class.getAnnotation(Class.java:3415)
   at 
 org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
 {code}
 Sample query:
 {code}
 explain select count(*) from over1k where (
 (t=1 and si=2)
 or (t=2 and si=3)
 or (t=3 and si=4) 
 or (t=4 and si=5) 
 or (t=5 and si=6) 
 or (t=6 and si=7) 
 or (t=7 and si=8)
 ...
 ..
 {code}
 repeat the filter for few thousand times for reproduction of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries

2015-08-02 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651085#comment-14651085
 ] 

Matt McCline commented on HIVE-11415:
-


The example could be viewed as an extension of SQL's IN clause

{code}
column_name IN (value1,value2,...)
{code}

where we extend it to support struct constants/tuples in IN:

{code}
(t, si) IN ((1,2), (2,3), (3,4), (4,5), ...)
{code}

Rather than evaluating 8,000 OR expression nodes, do a single hash table lookup.

When there are lots of OR expressions with different columns / expressions, 
then vectorized OR operator could be generalized to ANY (as Gopal suggested) so 
it could in one evaluate look at more than 2 conditions.  I share Gopal's 
concern though that the planner may make subtle assumptions about there just 
being 2 arguments for OR.

Note: today vectorization does not support structs.

 Add early termination for recursion in vectorization for deep filter queries
 

 Key: HIVE-11415
 URL: https://issues.apache.org/jira/browse/HIVE-11415
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran
Assignee: Matt McCline

 Queries with deep filters (left deep) throws StackOverflowException in 
 vectorization
 {code}
 Exception in thread main java.lang.StackOverflowError
   at java.lang.Class.getAnnotation(Class.java:3415)
   at 
 org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
 {code}
 Sample query:
 {code}
 explain select count(*) from over1k where (
 (t=1 and si=2)
 or (t=2 and si=3)
 or (t=3 and si=4) 
 or (t=4 and si=5) 
 or (t=5 and si=6) 
 or (t=6 and si=7) 
 or (t=7 and si=8)
 ...
 ..
 {code}
 repeat the filter for few thousand times for reproduction of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries

2015-07-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649926#comment-14649926
 ] 

Gopal V commented on HIVE-11415:


The right fix for this is to go ahead and take a ~8000 OR tree and turn it into 
a balanced tree ~14 levels deep.

Failing to convert the tree to vectorization would be a bad idea in general, 
because this error can be progressively bypassed by running

 Add early termination for recursion in vectorization for deep filter queries
 

 Key: HIVE-11415
 URL: https://issues.apache.org/jira/browse/HIVE-11415
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran
Assignee: Matt McCline

 Queries with deep filters (left deep) throws StackOverflowException in 
 vectorization
 {code}
 Exception in thread main java.lang.StackOverflowError
   at java.lang.Class.getAnnotation(Class.java:3415)
   at 
 org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
 {code}
 Sample query:
 {code}
 explain select count(*) from over1k where (
 (t=1 and si=2)
 or (t=2 and si=3)
 or (t=3 and si=4) 
 or (t=4 and si=5) 
 or (t=5 and si=6) 
 or (t=6 and si=7) 
 or (t=7 and si=8)
 ...
 ..
 {code}
 repeat the filter for few thousand times for reproduction of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)