[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9482/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 04:17:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7479/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 04:03:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> See my other comment.
In some cases, FastLimitCheckExceededRows is 0,
e.g. SET debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0;


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 1
> I see. What I mean is to assure the value is greater than 0.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 04:02:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

2021-09-21 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
..


Patch Set 6:

(2 comments)

Thanks Vihang for introducing a way for batch event processing. I will rebase 
my patch for IMPALA-10923 on top of this patch.

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@270
PS6, Line 270: i=0, j=1
nit: spaces around assignment operator


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1803
PS6, Line 1803: for (T event : batchedEvents_) {
It seems ignoredPartitions still being processed here.



--
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 22 Sep 2021 03:55:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#16).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 147 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/16
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10922: Fix wrong sarg on ORC Date column in release build

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17861 )

Change subject: IMPALA-10922: Fix wrong sarg on ORC Date column in release build
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9481/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17861
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieda467c822cf5c6f0edc0ddfe4da61c46b04e51f
Gerrit-Change-Number: 17861
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Sep 2021 02:42:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10922: Fix wrong sarg on ORC Date column in release build

2021-09-21 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17861


Change subject: IMPALA-10922: Fix wrong sarg on ORC Date column in release build
..

IMPALA-10922: Fix wrong sarg on ORC Date column in release build

One line of code is wrapped with DCHECK in generating ORC Date type
literal for predicate pushdown. Release build will ignore all codes
wrapped in DCHECK so that line of logic is missing. This patch fixes it
by moving the code outside of DCHECK.

Tests:
 - Reproduced the issue and verified the fix in release build.

Change-Id: Ieda467c822cf5c6f0edc0ddfe4da61c46b04e51f
---
M be/src/exec/hdfs-orc-scanner.cc
1 file changed, 2 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/17861/1
--
To view, visit http://gerrit.cloudera.org:8080/17861
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieda467c822cf5c6f0edc0ddfe4da61c46b04e51f
Gerrit-Change-Number: 17861
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10876: Support to download JWKS from given URL

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17802 )

Change subject: IMPALA-10876: Support to download JWKS from given URL
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9479/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17802
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df
Gerrit-Change-Number: 17802
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 21 Sep 2021 22:19:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9480/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 2
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 21 Sep 2021 22:09:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@509
PS2, Line 509: org.apache.impala.catalog.Table tbl = 
getTableAndAcquireWriteLock(partition.getDbName(),
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/2/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@367
PS2, Line 367: HdfsTable currentTbl = (HdfsTable) 
catalog_.getTable(TEST_DB_NAME, tblNameLowerCase);
line too long (97 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 2
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 21 Sep 2021 22:01:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP]: Initial commit: Add logic to sync to latest event id

2021-09-21 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17756 )

Change subject: [WIP]: Initial commit: Add logic to sync to latest event id
..


Patch Set 31:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/17756/31/be/src/catalog/catalog-server.cc@116
PS31, Line 116: enable_catalogd_cache_sync_to_latest_event_id
> may be a better name would be enable_sync_to_latest_event_on_ddls. The "cat
Yes, that sounds better. Will rename it


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2141
PS31, Line 2141:   if (tbl == null) return null;
   :   LOG.debug("table {} exits in cache, last synced id {}", 
tbl.getFullName(),
   :   tbl.getLastSyncedEventId());
   :   // if no validWriteIdList is provided, we return the tbl 
if its loaded
   :   if (tbl.isLoaded() && validWriteIdList == null) {
   : LOG.debug("returning already loaded table {}", 
tbl.getFullName());
   : return tbl;
   :   }
> can you change these to trace? They might spew a lot of logs otherwise.
Ack


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@269
PS31, Line 269: NewMetastoreEventsFactory
> Do we need a new factory? Can we not override the isSelfEvent() method dire
The new factory is not really required at this point in time and we can 
override the existing isSelfEvent() method. Thats why I had added a TODO as 
well comment to get suggestion on the same


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@841
PS31, Line 841:   if (!catalog_.tryLockDb(db)) {
  : throw new CatalogException(String.format("Couldn't 
acquire lock on db %s",
  : db.getName()));
  :   }
  :   catalog_.getLock().writeLock().unlock();
  :   boolean shouldSkip = false;
  :   if (db.getLastSyncedEventId() >= eventId) {
  : LOG.info("Skipping event {} on db {} since it is 
already synced till event id {}",
  : eventInfo, dbName_, eventId);
  : shouldSkip = true;
  :   }
  :   db.getLock().unlock();
> please use try .. finally pattern when dealing with locks. Also make sure y
Yes I usually follow try finally pattern for locks. Didn't do it here because 
b/w db lock and unlock we are not doing anything that can throw an error. 
Nevertheless, will fix it.

There is one more way to deal with it:
We can skip acquiring lock on db and can simply check if 
db.getLastSyncedEventId() >= eventId here. If the condition passes, we anyways 
acquire lock on db when processing an event. At that time, we again compare 
db's last synced event id with event id of the current event.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@264
PS31, Line 264: import
> not sure why are we commenting these?
These imports are not required. They were accidently added. Will remove them


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@316
PS31, Line 316: metastoreEventsMetrics_
> I think it would be very confusing to have same named metrics at 2 differen
We could reuse the existing metrics. However I thought that it would be more 
transparent if every user of metastore events i.e catalogServiceCatalog, 
CatalogOpExecutor etc track the metrics of events processed by them 
individually.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/17756/31/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@438
PS31, Line 438:   LOG.info("Adding db {} to catalogd, current event id {}", 
msDb.getName(),
  :   currentEventId);
  :   addDbToCatalog(currentEventId, msDb);
> Why do we need to add the db here? Shouldn't syncT

[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-09-21 Thread Sourabh Goyal (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17859

to look at the new patch set (#2).

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..

IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when 
performing
DDL operations via catalog HMS endpoints

Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java
A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsFileMetadataTest.java
A 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M tests/custom_cluster/test_metastore_service.py
27 files changed, 3,325 insertions(+), 250 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/2
--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 2
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10876: Support to download JWKS from given URL

2021-09-21 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17802 )

Change subject: IMPALA-10876: Support to download JWKS from given URL
..

IMPALA-10876: Support to download JWKS from given URL

This patch added functionality to download JWKS from a given URL and
support key rotation by periodically checking the JWKS URL for updates.
We use Kudu's EasyCurl wrapper to download file from the given URL.
curl was added to native-toolchain. This patch modified makefiles
and bootstrap_toolchain.py to integrate libcurl and libkudu_curl_util.
Upgraded source code for Kudu's EasyCurl wrapper (curl_util.cc and
curl_util.h in kudu/util) from the top of Kudu upstream (commit hash:
7a4061e87e).

Added end-end JWT authentication test cases with JWKS specified as
HTTP/HTTPS URL.

Testing:
 - Passed core run, including new test cases.

Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/curl_util-test.cc
M be/src/kudu/util/curl_util.cc
M be/src/kudu/util/curl_util.h
M be/src/rpc/authentication.cc
M be/src/service/impala-server.cc
M be/src/util/jwt-util-internal.h
M be/src/util/jwt-util-test.cc
M be/src/util/jwt-util.cc
M be/src/util/jwt-util.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindCurl.cmake
M fe/src/test/java/org/apache/impala/customcluster/CustomClusterRunner.java
M fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
M fe/src/test/java/org/apache/impala/customcluster/JwtWebserverTest.java
A testdata/jwt/jwks_es256.json
20 files changed, 725 insertions(+), 192 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/17802/4
--
To view, visit http://gerrit.cloudera.org:8080/17802
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df
Gerrit-Change-Number: 17802
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10876: Support to download JWKS from given URL

2021-09-21 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17802 )

Change subject: IMPALA-10876: Support to download JWKS from given URL
..


Patch Set 3:

(3 comments)

Add a working thread to periodically check the JWKS URL for updates.

http://gerrit.cloudera.org:8080/#/c/17802/3/be/src/util/jwt-util.cc
File be/src/util/jwt-util.cc:

http://gerrit.cloudera.org:8080/#/c/17802/3/be/src/util/jwt-util.cc@585
PS3, Line 585:   curl_global_init(CURL_GLOBAL_DEFAULT);
 :
 :   // Initialize curl session.
 :   curl_handle = curl_easy_init();
 :   if (curl_handle == nullptr) {
 : status = Status("Error to initialize curl session");
 : goto cleanup;
 :   }
 :   curl_easy_setopt(curl_handle, CURLOPT_URL, jwks_url.c_str());
 :   curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, 
CurlWriteDownloadBufferCallback);
 :   curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, 
(void*)&chunk);
 :   curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, 
"libcurl-agent/1.0");
> It's worth looking at Kudu's EasyCurl wrapper to see if it would work for u
Will change code to use Kudu's EasyCurl wrapper. The source code of Kudu's 
EasyCurl were updated in Kudu's upstream repo. Need to upgrade the 
corresponding code from Kudu.


http://gerrit.cloudera.org:8080/#/c/17802/3/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
File fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java:

http://gerrit.cloudera.org:8080/#/c/17802/3/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java@227
PS3, Line 227: String webserverArgs = "--webserver_port=25010";
> It would be good to make it very clear that the statestore is serving up th
Will change variable names as suggested to make code more readable.


http://gerrit.cloudera.org:8080/#/c/17802/3/www/jwt/jwks_rs256.json
File www/jwt/jwks_rs256.json:

http://gerrit.cloudera.org:8080/#/c/17802/3/www/jwt/jwks_rs256.json@1
PS3, Line 1:
   : {
   :   "keys": [
   : { "use": "sig", "kty": "RSA", "kid": 
"public:c424b67b-fe28-45d7-b015-f79da50b5b21", "alg": "RS256", "n": 
"uGbXWiK3dQTyCbX5xdE4yCuYp0AF2d15Qq1JSXT_lx8CEcXb9RbDddl8jGDv-spi5qPa8qEHiK7FwV2KpRE983wGPnYsAm9BxLFb4YrLYcDFOIGULuk2FtrPS512Qea1bXASuvYXEpQNpGbnTGVsWXI9C-yjHztqyL2h8P6mlThPY9E9ue2fCqdgixfTFIF9Dm4SLHbphUS2iw7w1JgT69s7of9-I9l5lsJ9cozf1rxrXX4V1u_SotUuNB3Fp8oB4C1fLBEhSlMcUJirz1E8AziMCxS-VrRPDM-zfvpIJg3JljAh3PJHDiLu902v9w-Iplu1WyoB2aPfitxEhRN0Yw",
 "e": "AQAB" },
   : { "use": "sig", "kty": "RSA", "kid": 
"public:9b9d0b47-b9ed-4ba6-9180-52fc5b161a3a", "alg": "RS256", "n": 
"xzYuc22QSst_dS7geYYK5l5kLxU0tayNdixkEQ17ix-CUcUbKIsnyftZxaCYT46rQtXgCaYRdJcbB3hmyrOavkhTpX79xJZnQmfuamMbZBqitvscxW9zRR9tBUL6vdi_0rpoUwPMEh8-Bw7CgYR0FK0DhWYBNDfe9HKcyZEv3max8Cdq18htxjEsdYO0iwzhtKRXomBWTdhD5ykd_fACVTr4-KEY-IeLvubHVmLUhbE5NgWXxrRpGasDqzKhCTmsa2Ysf712rl57SlH0Wz_Mr3F7aM9YpErzeYLrl0GhQr9BVJxOvXcVd4kmY-XkiCcrkyS1cnghnllh-LCwQu1sYw",
 "e": "AQAB" }
   :   ]
   : }
> Everything in the www/ directory gets shipped as part of the docker image b
Move the JWKS file back to directory testdata/jwt, and copy the JWKS files 
temporarily to root directory of Web server when running test and delete it at 
the end of test.



--
To view, visit http://gerrit.cloudera.org:8080/17802
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df
Gerrit-Change-Number: 17802
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 21 Sep 2021 21:46:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9478/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Sep 2021 20:09:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

2021-09-21 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@955
PS5, Line 955:
> nit: too much indent
Done


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1027
PS5, Line 1027:
> nit: +2 indent
Done


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2713
PS5, Line 2713:   return hmsPartToHdfsPart.size();
> Shouldn't we return hmsPartToHdfsPart.size() since those are the actual par
yes, thanks for spotting that. Fixed.


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@279
PS2, Line 279: if (!current.canBeBatched(next)) {
> Currently one of the conditions of batching two events together is - event
Yes, the events definitely can be intermingled from multiple tables. In case of 
ALTER_PARTITION event it is less likely to happen since these events are 
transactional in nature (publish all or none) and they hold a eventId lock in 
HMS when being generated. However, this is highly dependent on the source 
application which generates the events. For example if Hive makes 10 
alterPartition calls instead of 1 alterPartitions() for 10 partitions, the 
event stream may include events which are inter-mingled between different 
tables.

The batching logic currently optimizes for the common case of consecutive 
ALTER_PARTITION or INSERT events being generated by a single query. More 
sophisticated batching schemes like you mentioned can be explored but may 
increase the scope of the patch and I am inclined to commit this first and then 
incrementally improve it if needed. My primary concern about batching 
non-consecutive events is that we need to make sure that we are not introducing 
any subtle bugs because of the the fact that we are refreshing the partitions 
out of order then event stream.


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:   
MetastoreEventPropertyKey.CATALOG_VERSION.getKey(), "-1"));
> I understand that there may not be significant performance gain. I still fe
Okay. Let me look into this and see how much additional changes will need to 
made for this.


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1811
PS5, Line 1811: Ref
> nit: indentation is off
My IDE seems to like doing this. Not sure how to fix it. Manually edited it.


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@100
PS5, Line 100: dx);
> This precondition is checked by the Java runtime.
makes sense. Removed it.



--
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Sep 2021 19:46:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

2021-09-21 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
..

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type  Before  After
==
External table  75 sec  25 sec
ACID tables 313 sec 47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
9 files changed, 920 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/6
--
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10921 Add script to compare TPCDS runs.

2021-09-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17855 )

Change subject: IMPALA-10921 Add script to compare TPCDS runs.
..


Patch Set 1:

(2 comments)

Hi Amogh,
Thanks for submitting the script.
Besides the flake8 errors, I have few comments and request to incorporate.

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py
File bin/diagnostics/experimental/tpcds_run_comparator.py:

http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@211
PS1, Line 211: def print_results(ht_mem_res, op_mem_res):
It will be great if we can add option to print this output to 2 csv files.


http://gerrit.cloudera.org:8080/#/c/17855/1/bin/diagnostics/experimental/tpcds_run_comparator.py@245
PS1, Line 245: if (i == 23 or i == 24):
 :   filenames = [str(i)+"_a.txt", str(i)+"_b.txt"]
For iterating the profiles, what if we just use simple filename matching 
between the baseline dir and new measurement dir?
The reason is that the naming convention is not always standardized. I have 
seen variation such as '23_a.txt', '23a.txt', 'tpcds-23a.txt', and so on.
Using simple filename match will also make the script more general, not 
specific to just TPC-DS.



--
To view, visit http://gerrit.cloudera.org:8080/17855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib2e9ae1a2919156b0022072f47ff71d7775b20e6
Gerrit-Change-Number: 17855
Gerrit-PatchSet: 1
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Sep 2021 18:17:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17853 )

Change subject: IMPALA-10714: Defer advancing read page until the buffer is 
attached
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9477/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83
Gerrit-Change-Number: 17853
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 21 Sep 2021 17:49:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached

2021-09-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17853 )

Change subject: IMPALA-10714: Defer advancing read page until the buffer is 
attached
..


Patch Set 3:

(3 comments)

Thank you for the feedback!
After expanding the test case, I found that fix from patch set 2 is still 
incorrect.
Delayed advancement of read page should only allowed in read-write stream.
Patch set 3 add checks for 'has_write_iterator_' to make sure that the stream 
is indeed a read-write stream.

http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17853/2//COMMIT_MSG@10
PS2, Line 10: Stream
> nit. expects
Done


http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1388
PS2, Line 1388: eservation to alloc
> May need to add the following additional test cases
This is now addressed by:
1. TestUnpinAfterFullStreamRead(num_rows, buffer_size, max_num_pages, false, 
true, false);
2. TestUnpinAfterFullStreamRead(num_rows, buffer_size, max_num_pages, true, 
false, ...);


http://gerrit.cloudera.org:8080/#/c/17853/2/be/src/runtime/buffered-tuple-stream-test.cc@1397
PS2, Line 1397:
> May need to test #pages == 1.
I believe this is now addressed by TestUnpinAfterFullStreamRead with 
attach_on_read=true and refill_before_unpin=false.
With this argument, the stream only have 1 page left before UnpinStream() is 
called.



--
To view, visit http://gerrit.cloudera.org:8080/17853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83
Gerrit-Change-Number: 17853
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 21 Sep 2021 17:40:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached

2021-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17853 )

Change subject: IMPALA-10714: Defer advancing read page until the buffer is 
attached
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17853/3/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17853/3/be/src/runtime/buffered-tuple-stream-test.cc@1490
PS3, Line 1490: VerifyStreamState(&stream, num_pages, num_pinned_pages, 
num_unpinned_pages, buffer_size);
line too long (93 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83
Gerrit-Change-Number: 17853
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 21 Sep 2021 17:28:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10714: Defer advancing read page until the buffer is attached

2021-09-21 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17853

to look at the new patch set (#3).

Change subject: IMPALA-10714: Defer advancing read page until the buffer is 
attached
..

IMPALA-10714: Defer advancing read page until the buffer is attached

If a BufferedTupleStream is a read-write stream and set up with
attach_on_read = true, BufferedTupleStream::NextReadPage() expects that
the page buffer is attached to the output row batch before advancing the
read iterator. However, BufferedTupleStream::GetNextInternal() will not
attach the page buffer of a completely read page if it is a read-write
page. BufferedTupleStream::UnpinStream() need to defer advancing the
read page if this situation happens.

This patch adds BE test StreamStateTest.UnpinFullyExhaustedReadPage that
simulates the corner case. This patch also moves BE test
DeferAdvancingReadPage and ShortDebugString into class StreamStateTest
to reduce friend class declaration in buffered-tuple-stream.h

Testing:
- Run and pass BE test StreamStateTest.UnpinFullyExhaustedReadPage.

Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
3 files changed, 214 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17853/3
--
To view, visit http://gerrit.cloudera.org:8080/17853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I586ed72ba01cc3f28b0dcb1e202b3ca32a6c3b83
Gerrit-Change-Number: 17853
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

2021-09-21 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg 
tables
..


Patch Set 1:

(8 comments)

Looks very reasonable.

It is not clear to me how a user select a particular iceberg table snapshot to 
use in a query.

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
: block locations were not consistent across the reload
> This is not clear to me: if the block locations have changed between querie
+1.

I think the fix is applicable when an iceberg table of a particular snapshot is 
to be scanned. In doing so for the same table instance, the partition info 
about the snapshot can be reused.

When only the last version of the table is to be scanned, my impression is that 
the patch should not be applied.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@110
PS1, Line 110:   "Failed to load table: %s.%s", msTable.getDbName(), 
msTable.getTableName()), e);
nit probably should include the snapshot id in the message, if it is not in e.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1227
PS1, Line 1227: a
nit. an


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1268
PS1, Line 1268:   addSummary(response, "Updated table.");
Updated table after set table properties


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1272
PS1, Line 1272: Updated table.
Updated table after unset table properties


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1295
PS1, Line 1295: if (!needsToUpdateHms) {
  :   loadTableMetadata(tbl, newCatalogVersion, true, true, 
null, "ALTER Iceberg TABLE " +
  :   params.getAlter_type().name());
  :   catalog_.addVersionsForInflightEvents(false, tbl, 
newCatalogVersion);
  :   addTableToCatalogUpdate(tbl, wantMinimalResult, 
response.result);
  :   return false;
  : }
Should this block of code be executed within 1283-1286 too? That is, if txn is 
needed, then this block of code is protected by the txn.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@113
PS1, Line 113: addColumn
nit. addColumns()?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@339
PS1, Line 339: the table parameters
nit. update property in 'txn'



--
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 21 Sep 2021 17:23:08 +
Gerrit-HasComments: Yes