Csaba Ringhofer has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/24052 )
Change subject: IMPALA-14792: Try avoiding hadoop.fs.Path when loading Iceberg tables ...................................................................... IMPALA-14792: Try avoiding hadoop.fs.Path when loading Iceberg tables Quick and dirty solution to speed up IcebergFileMetadataLoader. Its correctness is based on the assumption that Iceberg file locations must be normalized. Noticed in flamegraphs that org.apache.hadoop.fs.Path constructor is one of the main CPU consumers during Iceberg table loading, especially incremental reloads when most file descriptors are reused. hadoop.fs.Path was used to relativize locations compared to base table location and to get the "path" part of the URI. These can be done with simple String operations if we can assume that the URIs are normalized. Results on 1M file 25K partition Iceberg table: Full load: 13s->10s Incremental load (0 files): 9s->3.5s hadoop.fs.Path constructor still uses significant CPU time after the change, but mainly in functions that run in parallel, so its effect is not longer that visible in total execution time. See Jira for before/after flamegraphs. Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea Reviewed-on: http://gerrit.cloudera.org:8080/24052 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Csaba Ringhofer <[email protected]> --- M fe/src/main/java/org/apache/impala/catalog/FileDescriptor.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java 3 files changed, 65 insertions(+), 25 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved Csaba Ringhofer: Verified -- To view, visit http://gerrit.cloudera.org:8080/24052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea Gerrit-Change-Number: 24052 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
