[GitHub] [iceberg] ahshahid commented on issue #6039: Spark : Perf enhancement by leveraging Dynamic Partition Pruning rule of spark for non partition columns used as join condition

GitBox Fri, 04 Nov 2022 09:58:12 -0700


ahshahid commented on issue #6039:
URL: https://github.com/apache/iceberg/issues/6039#issuecomment-1303883448


   Some update:
   For tpcds query with limited data and enabling stats at manifest level for 
non partition cols, still does not improve perf.. the cost of dpp query is 
pretty high, especially for queries 14a, 14b of tpcd.
   But there is one thing which I am going to try is:
   1) For non partition columns pruning, we do not need exact value of join 
keys in DPP. So I am going to modify the spark dpp query for non partitioning 
columns, to fetch max & min.  I am hoping that spark-iceberg code optimizes 
max/min queries by computing the answer using only the stats at manifest file 
level.. If so , this should reduce the cost of dpp query & still allow pruning 
on range at various levels in iceberg...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ahshahid commented on issue #6039: Spark : Perf enhancement by leveraging Dynamic Partition Pruning rule of spark for non partition columns used as join condition

Reply via email to