[jira] [Resolved] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-25 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12800.

Resolution: Fixed

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 
> org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:136){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12800.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Performance on these types of queries has been substantially improved. We saw 
an improvement of 20x on the example query. It would likely be more on larger 
queries as we switched from O(n^2) to O(n) operations for ExprSubstitutionMap.

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 
> org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:136){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h..