Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24049 )

Change subject: POC IMPALA-14805: LocalIcebergTable loads files in coordinator
......................................................................


Patch Set 7:

(12 comments)

Nice change! And sorry for the late response!

http://gerrit.cloudera.org:8080/#/c/24049/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/24049/7//COMMIT_MSG@24
PS7, Line 24: weak pointer
nit: weak reference


http://gerrit.cloudera.org:8080/#/c/24049/7//COMMIT_MSG@49
PS7, Line 49: inconsistant
nit: inconsistent


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@162
PS7, Line 162:   // The following members are considered immutable and 
shouldn't change after
             :   // constructing the object:
At some places we have BEGIN / END sections to make it more explicit. See e.g. 
StatementBase.java, parquet-column-readers.h


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@231
PS7, Line 231: new IcebergContentFileStore();
I wonder if Java is smart enough to skip the creation of the empty containers. 
Alternatively, we could introduce a new constructor that takes every immutable 
member.


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@886
PS7, Line 886: prefixes
nit: Could be just 'List.of(getLocation())'


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1257
PS7, Line 1257: getIcebergContentFileStore(final TableMetaRef table,
nit: fits earlier line


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1263
PS7, Line 1263: do
nit: to


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1271
PS7, Line 1271:     WeakReference<MetaProvider.CachedIcebergFiles> 
lastFileStore =
              :         (WeakReference<MetaProvider.CachedIcebergFiles>)
              :             cache_.getIfPresent(tableKey);
Is it possible to move this inside the call() method?


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1299
PS7, Line 1299:     // Mutable concurrent hash maps are used to cache files 
from time-travel queries,
              :     // which is thread-safe, but would increase the size of the 
cached object without
              :     // updating weight.
              :     // TODO: come up with a way to cache files in time-travel 
queries
I wonder if it even make sense to store time-travel FDs and partitions in the 
IcebergContentFileStore, since they are not being re-used anyway (except maybe 
in legacy catalog mode).


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@138
PS7, Line 138: isInIcebergTable_
Is it necessary to have this? The internal FS table should be minimal since 
IMPALA-13737.


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@98
PS7, Line 98: loadedInCatalog_
It is only written, never read.


http://gerrit.cloudera.org:8080/#/c/24049/7/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@158
PS7, Line 158:     // TODO: understand if this is needed with local loading
LocalTable.load() invokes t.loadColumnStats() -> 
metaProvider.loadTableColumnStatistics() -> getPartialCatalogObject() after 
loading the table. Since file loading happens between the first 
getPartialCatalogObject() and the last, I'd assume we still need to warmup the 
cache. Or it would be nice to avoid the CatalogD interaction in this case.



--
To view, visit http://gerrit.cloudera.org:8080/24049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6732af76a2e040fa57e39260302951466037b934
Gerrit-Change-Number: 24049
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 24 Mar 2026 14:08:52 +0000
Gerrit-HasComments: Yes

Reply via email to