Csaba Ringhofer created IMPALA-14734:
----------------------------------------

             Summary: Planning on large iceberg tables can be dominated by 
sorting
                 Key: IMPALA-14734
                 URL: https://issues.apache.org/jira/browse/IMPALA-14734
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Csaba Ringhofer


Noticed that planning on large Iceberg tables can be faster when using 
Iceberg's plan files compared to Impala's "optimized" path using cached file 
descriptors. The reason seems to be that planning time is dominated by sorting 
file descriptors, which decodes utf8 pathes in the backing flat buffer 
structure n log(n) times 
https://github.com/apache/impala/blob/3be15fd3598071eaeddd9b4d29e0883b95fdd14a/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java#L116



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to