Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20574 )

Change subject: IMPALA-7131: Support external data sources in LocalCatalog mode
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20574/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20574/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3664
PS5, Line 3664: Lists.newArrayList()
nit: use Collections.emptyList() to avoid creating new array objects.


http://gerrit.cloudera.org:8080/#/c/20574/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3675
PS5, Line 3675: Lists.newArrayList()
nit: use Collections.emptyList()


http://gerrit.cloudera.org:8080/#/c/20574/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/20574/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@671
PS3, Line 671:     if (dsName == null) dsName = "";
             :     req.object_desc.setData_source(new TDataSource(dsName, "", 
"", ""));
             :     return req;
             :   }
             :
             :   @Override
             :   public Database loadDb(final String dbName) throws TException {
             :     return loadWithCaching("database metadata for " + dbName,
             :         DB_METADATA_STATS_CATEGORY,
> Changed code as suggested.
Ack


http://gerrit.cloudera.org:8080/#/c/20574/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1233
PS3, Line 1233:             TGetPartialCatalogObjectResponse resp = 
sendRequest(req);
> It's unused now. Plan to add more fields in TDataSource object after making
I see. I think the DataSource name list could be inconsistent with the 
DataSource object list since they are fetched at different time. E.g. the cache 
might have a name list of 3 items but the object list is stale and just have 2 
items.

We might need additional invalidation across them, i.e. when fetched the name 
list, invalidate the object list, and vice versus. If we won't add too many 
large fields in TDataSource object, maybe using one list is simpler. E.g. just 
like how we cache functions, we cache the function name list of each db and 
cache each individual function object.


http://gerrit.cloudera.org:8080/#/c/20574/5/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20574/5/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java@232
PS5, Line 232:       LOG.info("Unable to load DataSource objects, " + 
e.getMessage());
nit: the stacktrace might be helpful, so let's log it as

 LOG.info("Unable to load DataSource objects", e);


http://gerrit.cloudera.org:8080/#/c/20574/5/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java@244
PS5, Line 244:       LOG.info("DataSource not found: " + dsName + ", " + 
e.getMessage());
nit: log the stacktrace

  LOG.info("DataSource not found: " + dsName, e);


http://gerrit.cloudera.org:8080/#/c/20574/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test:

http://gerrit.cloudera.org:8080/#/c/20574/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test@5
PS5, Line 5: CREATE DATA SOURCE AllTypesDataSource
I see this is created in testdata/bin/create-ext-data-source-table.sql. Are we 
re-creating it to test creating data sources?
https://github.com/apache/impala/blob/c244aadcf367360e52807a84e7fba8b6237651fd/testdata/bin/create-ext-data-source-table.sql#L24


http://gerrit.cloudera.org:8080/#/c/20574/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test@184
PS5, Line 184: DROP DATA SOURCE AllTypesDataSource;
Wouldn't this impact other tests that depend on this data source? Or is this 
the only test that use it?



--
To view, visit http://gerrit.cloudera.org:8080/20574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40841c9be9064ac67771c4d3f5acbb3b552a2e55
Gerrit-Change-Number: 20574
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <gsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Oct 2023 06:45:21 +0000
Gerrit-HasComments: Yes

Reply via email to