[jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries
[ https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652450#comment-14652450 ] Matt McCline commented on HIVE-11415: - [~jvaria] FYI. Add early termination for recursion in vectorization for deep filter queries Key: HIVE-11415 URL: https://issues.apache.org/jira/browse/HIVE-11415 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Assignee: Matt McCline Queries with deep filters (left deep) throws StackOverflowException in vectorization {code} Exception in thread main java.lang.StackOverflowError at java.lang.Class.getAnnotation(Class.java:3415) at org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29) at org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) {code} Sample query: {code} explain select count(*) from over1k where ( (t=1 and si=2) or (t=2 and si=3) or (t=3 and si=4) or (t=4 and si=5) or (t=5 and si=6) or (t=6 and si=7) or (t=7 and si=8) ... .. {code} repeat the filter for few thousand times for reproduction of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries
[ https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651085#comment-14651085 ] Matt McCline commented on HIVE-11415: - The example could be viewed as an extension of SQL's IN clause {code} column_name IN (value1,value2,...) {code} where we extend it to support struct constants/tuples in IN: {code} (t, si) IN ((1,2), (2,3), (3,4), (4,5), ...) {code} Rather than evaluating 8,000 OR expression nodes, do a single hash table lookup. When there are lots of OR expressions with different columns / expressions, then vectorized OR operator could be generalized to ANY (as Gopal suggested) so it could in one evaluate look at more than 2 conditions. I share Gopal's concern though that the planner may make subtle assumptions about there just being 2 arguments for OR. Note: today vectorization does not support structs. Add early termination for recursion in vectorization for deep filter queries Key: HIVE-11415 URL: https://issues.apache.org/jira/browse/HIVE-11415 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Assignee: Matt McCline Queries with deep filters (left deep) throws StackOverflowException in vectorization {code} Exception in thread main java.lang.StackOverflowError at java.lang.Class.getAnnotation(Class.java:3415) at org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29) at org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) {code} Sample query: {code} explain select count(*) from over1k where ( (t=1 and si=2) or (t=2 and si=3) or (t=3 and si=4) or (t=4 and si=5) or (t=5 and si=6) or (t=6 and si=7) or (t=7 and si=8) ... .. {code} repeat the filter for few thousand times for reproduction of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries
[ https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649926#comment-14649926 ] Gopal V commented on HIVE-11415: The right fix for this is to go ahead and take a ~8000 OR tree and turn it into a balanced tree ~14 levels deep. Failing to convert the tree to vectorization would be a bad idea in general, because this error can be progressively bypassed by running Add early termination for recursion in vectorization for deep filter queries Key: HIVE-11415 URL: https://issues.apache.org/jira/browse/HIVE-11415 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Assignee: Matt McCline Queries with deep filters (left deep) throws StackOverflowException in vectorization {code} Exception in thread main java.lang.StackOverflowError at java.lang.Class.getAnnotation(Class.java:3415) at org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29) at org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996) at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164) {code} Sample query: {code} explain select count(*) from over1k where ( (t=1 and si=2) or (t=2 and si=3) or (t=3 and si=4) or (t=4 and si=5) or (t=5 and si=6) or (t=6 and si=7) or (t=7 and si=8) ... .. {code} repeat the filter for few thousand times for reproduction of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)