[GitHub] [spark] viirya commented on pull request #33142: [SPARK-35940][SQL] Refactor EquivalentExpressions to make it more efficient
viirya commented on pull request #33142: URL: https://github.com/apache/spark/pull/33142#issuecomment-873424131 Thanks! Merging to master/branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #33142: [SPARK-35940][SQL] Refactor EquivalentExpressions to make it more efficient
viirya commented on pull request #33142: URL: https://github.com/apache/spark/pull/33142#issuecomment-873349727 @maropu Any more comments? Otherwise I will merge this tomorrow. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #33142: [SPARK-35940][SQL] Refactor EquivalentExpressions to make it more efficient
viirya commented on pull request #33142: URL: https://github.com/apache/spark/pull/33142#issuecomment-871634042 > Can you briefly introduce your idea? Sorting by height is stable and fast now. I've not looked in the details yet. Is sorting by height guaranteed to sort expressions by child-parent? I said current sorting is not reliable because it might miss some cases probably. It is because two expressions with no child-parent relation has no clear comparison order. So sorting is somehow unreliable for expressions. Does sorting by height solve it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #33142: [SPARK-35940][SQL] Refactor EquivalentExpressions to make it more efficient
viirya commented on pull request #33142: URL: https://github.com/apache/spark/pull/33142#issuecomment-871104576 > Can you briefly introduce your idea? Sorting by height is stable and fast now. Basically, the steps are: 1. Propagate the `SubExprEliminationState` map for all subexprs (no needed to be sorted). Only create the value and isNull variables, don't do codegen yet. 2. Iterate all subexprs to do codegen. Because expression codegen will look at the map to replace subexprs, any subexpr in children will be replaced and chained. So we don't need to sort subexprs in advance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #33142: [SPARK-35940][SQL] Refactor EquivalentExpressions to make it more efficient
viirya commented on pull request #33142: URL: https://github.com/apache/spark/pull/33142#issuecomment-870971121 > track the "height" of common subexpressions, to quickly do child-parent sort. About this, I think the sorting is not reliable as it is hard to do child-parent sort. I have another proposal to get rid of the sort as I mentioned before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org