Csaba Ringhofer created IMPALA-14734:
----------------------------------------
Summary: Planning on large iceberg tables can be dominated by
sorting
Key: IMPALA-14734
URL: https://issues.apache.org/jira/browse/IMPALA-14734
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Csaba Ringhofer
Noticed that planning on large Iceberg tables can be faster when using
Iceberg's plan files compared to Impala's "optimized" path using cached file
descriptors. The reason seems to be that planning time is dominated by sorting
file descriptors, which decodes utf8 pathes in the backing flat buffer
structure n log(n) times
https://github.com/apache/impala/blob/3be15fd3598071eaeddd9b4d29e0883b95fdd14a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java#L116
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]