[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 1
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 21 Sep 2021 05:40:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered 
out rows in Parquet table.
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9476/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 21 Sep 2021 00:25:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.

2021-09-20 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17860


Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered 
out rows in Parquet table.
..

[WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in 
Parquet table.

Currently, entire row is materialized, before filtering upon it during
scan. Instead, cost can be saved if only the columns required for
filtering are materialized first and then rest of the columns are
materialized only for rows surviving after filter.

Performance: TBD

Testing: TBD

Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-columnar-scanner-ir.cc
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exec/parquet/parquet-collection-column-reader.h
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/scratch-tuple-batch.h
12 files changed, 344 insertions(+), 58 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17860/1
--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor 


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9475/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 1
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 23:50:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation

2021-09-20 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has abandoned this change. ( 
http://gerrit.cloudera.org:8080/17703 )

Change subject: [WIP]: Initial commit to acquire table/database lock in 
metastore server before any HMS operation
..


Abandoned

Consolidating multiple patches into one i.e https://gerrit.cloudera.org/c/17859/
--
To view, visit http://gerrit.cloudera.org:8080/17703
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I085eab20db61282daf4549ddbcc018aaf63cc361
Gerrit-Change-Number: 17703
Gerrit-PatchSet: 9
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] [WIP]: Initial commit: Add logic to sync to latest event id

2021-09-20 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has abandoned this change. ( 
http://gerrit.cloudera.org:8080/17756 )

Change subject: [WIP]: Initial commit: Add logic to sync to latest event id
..


Abandoned

Consolidating multiple patches into one i.e https://gerrit.cloudera.org/c/17859/
--
To view, visit http://gerrit.cloudera.org:8080/17756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia822d15725d0a9a9ad1398e10ed4ae3288d0e9ad
Gerrit-Change-Number: 17756
Gerrit-PatchSet: 31
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7478/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 1
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 23:31:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17859/1/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17859/1/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@704
PS1, Line 704: org.apache.impala.catalog.Table tbl = 
getTableAndAcquireWriteLock(partition.getDbName(),
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 1
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 23:29:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-20 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17859


Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..

IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when 
performing
DDL operations via catalog HMS endpoints

Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java
A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java
A 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M tests/custom_cluster/test_metastore_service.py
27 files changed, 3,535 insertions(+), 240 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/1
--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 1
Gerrit-Owner: Sourabh Goyal 


[Impala-ASF-CR] [WIP]: Initial commit: Add logic to sync to latest event id

2021-09-20 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17756 )

Change subject: [WIP]: Initial commit: Add logic to sync to latest event id
..


Patch Set 31:

(12 comments)

I will take another pass at this today or tomorrow since this patch is pretty 
big.

http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc@116
PS31, Line 116: enable_catalogd_cache_sync_to_latest_event_id
may be a better name would be enable_sync_to_latest_event_on_ddls. The 
"catalogd_cache" part of this config name is redundant.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2141
PS31, Line 2141:   if (tbl == null) return null;
   :   LOG.debug("table {} exits in cache, last synced id {}", 
tbl.getFullName(),
   :   tbl.getLastSyncedEventId());
   :   // if no validWriteIdList is provided, we return the tbl 
if its loaded
   :   if (tbl.isLoaded() && validWriteIdList == null) {
   : LOG.debug("returning already loaded table {}", 
tbl.getFullName());
   : return tbl;
   :   }
can you change these to trace? They might spew a lot of logs otherwise.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@62
PS31, Line 62: eventFactory_
It would be good to avoid having eventFactory_ here since it leaks events 
processor details out in unrelated classes like TableLoader.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@188
PS31, Line 188: initMetrics
This is a weird place to have this method? Why are we initializing events 
processor metrics in TableLoader?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@269
PS31, Line 269: NewMetastoreEventsFactory
Do we need a new factory? Can we not override the isSelfEvent() method directly 
in the MetastoreEvent.isSelfEvent() to look for 
enable_catalogd_cache_sync_to_latest_event_id config and return false early.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@841
PS31, Line 841:   if (!catalog_.tryLockDb(db)) {
  : throw new CatalogException(String.format("Couldn't 
acquire lock on db %s",
  : db.getName()));
  :   }
  :   catalog_.getLock().writeLock().unlock();
  :   boolean shouldSkip = false;
  :   if (db.getLastSyncedEventId() >= eventId) {
  : LOG.info("Skipping event {} on db {} since it is 
already synced till event id {}",
  : eventInfo, dbName_, eventId);
  : shouldSkip = true;
  :   }
  :   db.getLock().unlock();
please use try .. finally pattern when dealing with locks. Also make sure you 
have some form of UnlockWriteLockIfErronouslyLocked() in the finally block.


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@264
PS31, Line 264: import
not sure why are we commenting these?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@316
PS31, Line 316: metastoreEventsMetrics_
I think it would be very confusing to have same named metrics at 2 different 
places in the code. Why can't we reuse existing metastore event metrics?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@438
PS31, Line 438:   LOG.info("Adding db {} to catalogd, current event id {}", 
msDb.getName(),
  :   currentEventId);
  :   addDbToCatalog(currentEventId, msDb);
Why do we 

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

2021-09-20 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg 
tables
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31
PS1, Line 31: service ID and new catalog version for the Iceberg table
> Doesn't Iceberg have something similar to catalog version, e.g. something l
+1 if we have native version number support in Iceberg it is better to use that 
detect self-events. I am okay doing it in a separate patch if needed. The 
self-events logic for ALTER events is up for a overhaul in 
https://gerrit.cloudera.org/#/c/17756/ which would use the eventId based 
detection instead of catalog version number.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@348
PS1, Line 348: hdfsTable_
 : .load(false, msClient, msTable_, true, true, false, 
null, null,null, reason);
 : pathHashToFileDescMap_ = Utils.loadAllPartition(this);
This is probably unrelated to this patch I was curious to understand what is 
the difference between the file descriptors which are loaded by 
hdfsTable_.load() and the loadAllPartition(this) method here. Is there a 
overlap in these 2 collections of file descriptors?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@481
PS1, Line 481: TGetPartialCatalogObjectResponse resp =
 : getHdfsTable().getPartialInfo(req, missingPartialInfos);
 : if (req.table_info_selector.want_iceberg_snapshot) {
 :   resp.table_info.setIceberg_snapshot(
 :   FeIcebergTable.Utils.createTIcebergSnapshot(this));
 : }
are we sending some file descriptors twice here? Some from the hdfsTable and 
some from the pathHashToFileDescMap?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@116
PS1, Line 116: pathHashToFileDescMap_
Am I correct in understanding that pathHashToFileDescMap_ is always loaded from 
the snapshot? If yes, does catalogd need to load the filedescriptors in the 
hdfsTable_ backing this iceberg table?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2915
PS1, Line 2915: newCatalogVersion = catalog_.incrementAndGetCatalogVersion();
It is not clear to me why do we need to increment the catalog version here 
again?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2918
PS1, Line 2918:   iceTxn.commitTransaction();
  :   return newCatalogVersion;
Can you move these statements into a finally block and avoid code duplication 
in the if block and later down below?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6070
PS1, Line 6070: addVersionsForInflightEvents
Can we add test coverage for this in the test_self_events?



--
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 20 Sep 2021 20:48:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

2021-09-20 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2713
PS5, Line 2713:   return hmsPartitions.size();
Shouldn't we return hmsPartToHdfsPart.size() since those are the actual 
partitions we are reloading in the cache?


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@279
PS2, Line 279: if (!current.canBeBatched(next)) {
Currently one of the conditions of batching two events together is - event ids 
should be consecutive. This means that if there are events intermingled from 
multiple tables, then this batching logic won't be very effective. Do we see 
intermingled events in production? If yes, should we improve the batching logic 
here?


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:   
MetastoreEventPropertyKey.CATALOG_VERSION.getKey(), "-1"));
> Since the default value of polling is 1s and we fetch the events in the bat
I understand that there may not be significant performance gain. I still feel 
that we should batch alter partition events from other clusters as well (and 
ignore self events when actually processing a batched event) considering that 
we would be syncing table till latest event id in future where self event logic 
will not be applicable anymore.



--
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 20 Sep 2021 20:31:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

2021-09-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
..


Patch Set 5:

(4 comments)

Did a first round. The change looks really great. I can take a deeper look on 
Wednesday.

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@955
PS5, Line 955:
nit: too much indent


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1027
PS5, Line 1027:
nit: +2 indent


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1811
PS5, Line 1811:
nit: indentation is off


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@100
PS5, Line 100: idx >=0 && idx < insertEventIds_.size()
This precondition is checked by the Java runtime.



--
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 20 Sep 2021 19:23:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9474/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Mon, 20 Sep 2021 18:41:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2604
PS1, Line 2604:   public boolean reloadTableIfExists(String dbName, String 
tblName, long eventId, String reason)
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2610
PS1, Line 2610: LOG.debug("Not reloading the table {}.{} since it is 
recreated.", dbName, tblName);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1709
PS1, Line 1709:   
Preconditions.checkState(eventType_.equals(MetastoreEventType.ALLOC_WRITE_ID_EVENT));
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1712
PS1, Line 1712:   
MetastoreEventsProcessor.getMessageDeserializer().getAllocWriteIdMessage(event.getMessage());
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1081
PS1, Line 1081: Path parentPath = partition == null ? new 
Path(msTbl.getSd().getLocation()) : new Path(partition.getSd().getLocation());
line too long (124 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1083
PS1, Line 1083: List newFiles = addFilesToDirectory(deltaPath, 
"testFile.", totalNumberOfFilesToAdd, false);
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2283
PS1, Line 2283:   private void testInsertIntoTransactionalTable(String tblName, 
boolean toAbort, boolean isPartitioned)
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2297
PS1, Line 2297:   long writeId = 
MetastoreShim.allocateTableWriteId(client.getHiveClient(), txnId, TEST_DB_NAME, 
tblName);
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2304
PS1, Line 2304: partition = 
client.getHiveClient().getPartition(TEST_DB_NAME, tblName, Arrays.asList("1"));
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3088
PS1, Line 3088: TEST_DB_NAME, tblName, test_insert_tbl, 
updated_partitions, isOverwrite, txnId, writeId);
line too long (97 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 18:29:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-09-20 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17858


Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events for 
transactional tables

To enable fine-grained table refreshing, there are three main changes in this 
commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep 
track
  of write id changes by AllocWriteIdEvents, CommitTxnEvents, and 
AbortTxnEvents.
2. Trigger partition level refreshing for addPartitionEvents, 
dropPartitionEvents, and
  AlterPartitionEvents.
3. Introduce a config incremental_refresh_acid, which can switch on/off the 
fine-grained
  table refreshing.

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 672 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/1
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-20 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 15:

(2 comments)

Just like to double check on the expected test results.

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> I modified the unit test because autotest failed.FastLimitCheckExceededRows
See my other comment.


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 0
> Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, the
I see. What I mean is to assure the value is greater than 0.

So maybe use [1-9][0-9]*?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 20 Sep 2021 16:13:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached

2021-09-20 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17853 )

Change subject: IMPALA-10714: Defer advancing read page until the buffer is 
attached
..


Patch Set 2:

(3 comments)

Looks good!.

Some additional test cases may be needed to assure the new logic is sound.

http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG@10
PS2, Line 10: expect
nit. expects


http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1388
PS2, Line 1388: PrepareForReadWrite
May need to add the following additional test cases

1. PrepareForRead(true, ...) for read only use case;
2. PrepareForReadWrite(false, ...) that will not delay the advance of the 1st 
read page.


http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1397
PS2, Line 1397: 2
May need to test #pages == 1.



--
To view, visit http://gerrit.cloudera.org:8080/17853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83
Gerrit-Change-Number: 17853
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 20 Sep 2021 16:06:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

2021-09-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg 
tables
..


Patch Set 1:

(6 comments)

Some high level comments.
The code looks good to me so far, but I will go through it once more.

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
: block locations were not consistent across the reload
This is not clear to me: if the block locations have changed between queries, 
then it makes sense to schedule them differently, but if they didn't change 
(which seems the more common case to me), then how are they different?


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@25
PS1, Line 25: Fixing the above revealed another bug. Before this patch we didn't
: handle self-events of Iceberg tables.
Can you add more information about this? For example what events are generated 
in HMS for different Iceberg operations? I guess that schema changes will be 
treated similarly to the case when Impala calls HMS directly, but I can't 
imagine what happens in case of INSERTs at the moment.


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31
PS1, Line 31: service ID and new catalog version for the Iceberg table
Doesn't Iceberg have something similar to catalog version, e.g. something like 
"snapshot id", that could be used to check whether we need to reload data?

The current way we handle self-events is a horrible hack IMO which is needed 
because HMS doesn't provide a usable version number for table/partition 
metadata. Just hoping that Iceberg does have something like that :)


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@34
PS1, Line 34:  * added e2e test for the scan range assignment
It is a bit tricky, but self-event handling can be also tested, see 
https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java#L508


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@104
PS1, Line 104: properties.setProperty("external.table.purge", "TRUE");
Is this change related to the ones mentioned in the commit message?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@151
PS1, Line 151: properties.setProperty(IcebergTable.ICEBERG_CATALOG,
 :
tableProps.get(IcebergTable.ICEBERG_CATALOG));
Is this change related to the ones mentioned in the commit message?



--
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 20 Sep 2021 15:40:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9473/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 13:34:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg 
tables
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9472/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 20 Sep 2021 13:33:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17847/5/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java:

http://gerrit.cloudera.org:8080/#/c/17847/5/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@119
PS5, Line 119: // Even though we are in a struct, it gives null, probably 
because it is not fully analyzed.
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 13:13:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()

2021-09-20 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
..

IMPALA-10838: Error when struct returned from WITH()

The following query fails:
'''
with sub as (
select id, outer_struct
from functional_orc_def.complextypes_nested_structs)
select sub.id, sub.outer_struct.inner_struct2 from sub;
'''

with the following error:
'''
ERROR: IllegalStateException: Illegal reference to non-materialized
tuple: debugname=InlineViewRef sub alias=sub tid=6
'''

while if 'outer_struct.inner_struct2' is added to the select list of the
inline view, the query works as expected.

This change fixes the problem by two modifications:
  - if a field of a struct needs to be materialised, also materialise
all of its enclosing structs (ancestors)
  - in expression substitutions (in
org.apache.impala.analysis.Expr.substituteImpl), if there is no
substitution mapping for a field of a struct but there is a mapping
for an enclosing struct, find the mapping for that enclosing struct
and from its subexpressions use the one corresponding to the
original expression for substitution

This change also changes the way struct fields are materialised: until
now, if a member of a struct was needed to be materialised, the whole
struct, including other members of the struct were materialised. This
behaviour can lead to using significantly more memory than necessaty if
we for example query a single member of a large struct. This change
modifies this behaviour so that we only materialise the struct members
that are actually needed.

Tests:
  - added queries that are fixed by this change (including the one
above) in nested-struct-in-select-list.test

Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
---
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test
6 files changed, 227 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/17847/5
--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

2021-09-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17857


Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg 
tables
..

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications. Then Catalogd processes these modifications again
when they were arrive via the event notification mechanism. I fixed
this by creating Iceberg transactions in which I set the catalog
service ID and new catalog version for the Iceberg table.

Testing:
 * added e2e test for the scan range assignment

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M tests/query_test/test_iceberg.py
16 files changed, 290 insertions(+), 104 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/1
--
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9471/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 10:25:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()

2021-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17847/4/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java:

http://gerrit.cloudera.org:8080/#/c/17847/4/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@119
PS4, Line 119: // Even though we are in a struct, it gives null, probably 
because it is not fully analyzed.
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 20 Sep 2021 10:03:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()

2021-09-20 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
..

IMPALA-10838: Error when struct returned from WITH()

The following query fails:
'''
with sub as (
select id, outer_struct
from functional_orc_def.complextypes_nested_structs)
select sub.id, sub.outer_struct.inner_struct2 from sub;
'''

with the following error:
'''
ERROR: IllegalStateException: Illegal reference to non-materialized
tuple: debugname=InlineViewRef sub alias=sub tid=6
'''

while if 'outer_struct.inner_struct2' is added to the select list of the
inline view, the query works as expected.

This change fixes the problem by two modifications:
  - if a field of a struct needs to be materialised, also materialise
all of its enclosing structs (ancestors)
  - in expression substitutions (in
org.apache.impala.analysis.Expr.substituteImpl), if there is no
substitution mapping for a field of a struct but there is a mapping
for an enclosing struct, find the mapping for that enclosing struct
and from its subexpressions use the one corresponding to the
original expression for substitution

This change also changes the way struct fields are materialised: until
now, if a member of a struct was needed to be materialised, the whole
struct, including other members of the struct were materialised. This
behaviour can lead to using significantly more memory than necessaty if
we for example query a single member of a large struct. This change
modifies this behaviour so that we only materialise the struct members
that are actually needed.

Tests:
  - added queries that are fixed by this change (including the one
above) in nested-struct-in-select-list.test

Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
---
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M 
testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test
6 files changed, 197 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/17847/4
--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins