[Impala-ASF-CR] IMPALA-10967 Load data should handle AWS NLB-type timeout

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17955 )

Change subject: IMPALA-10967 Load data should handle AWS NLB-type timeout
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9645/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c2437e9894510204303ec07710cad60102c8821
Gerrit-Change-Number: 17955
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Sat, 23 Oct 2021 01:19:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10967 Load data should handle AWS NLB-type timeout

2021-10-22 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17955 )

Change subject: IMPALA-10967 Load data should handle AWS NLB-type timeout
..


Patch Set 4:

Rebase.


--
To view, visit http://gerrit.cloudera.org:8080/17955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c2437e9894510204303ec07710cad60102c8821
Gerrit-Change-Number: 17955
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Sat, 23 Oct 2021 00:57:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10967 Load data should handle AWS NLB-type timeout

2021-10-22 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17955 )

Change subject: IMPALA-10967 Load data should handle AWS NLB-type timeout
..

IMPALA-10967 Load data should handle AWS NLB-type timeout

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long data loading operations
are executing and the timeout happens, AWS silently drops the connection
and the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and utilizes a separate thread to run the data loading
and metadata refresh operation. Since this thread is waited for in a
wait thread which runs asynchronously, the execution of the entire
operation will not cause a wait on the Impala client. The Impala client
can check the status of the operation via repeated GetOperationStatus()
call.

Testing:
  1. Added a new test in load.test to verify that the asynchronous
 execution in BE keeps the session live;
  2. Ran core tests successfully.

Change-Id: I8c2437e9894510204303ec07710cad60102c8821
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M testdata/workloads/functional-query/queries/QueryTest/load.test
3 files changed, 88 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/17955/4
--
To view, visit http://gerrit.cloudera.org:8080/17955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c2437e9894510204303ec07710cad60102c8821
Gerrit-Change-Number: 17955
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

2021-10-22 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
..

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
 asynchronously.

External behavior change:
  1. A new field with name "DDL execution mode:" is added to the
 summary section in the runtime profile, next to "DDL Type". This
 field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
 is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
 JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Reviewed-on: http://gerrit.cloudera.org:8080/17872
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 386 insertions(+), 26 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 38
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
..


Patch Set 37: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 22 Oct 2021 23:42:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 20: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 20:29:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-22 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 20:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7
PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event 
id when performing
: DDL operations via catalog HMS endpoints
> ping
@Vihang: I already have this in my to-do list. Will write a detailed commit 
message.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672
PS12, Line 2672: lse;
> I am not sure I fully understand. If this change is not needed for the patc
I understand and as discussed over call, I will remove this method


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@288
PS19, Line 288: syncToLatestEventFactory_
> Based on my understanding this field here is unnecessary and is complicatin
@Vihang: We do need to pass event factory object to syncToLatestEventId() in 
metastoreEventProcessor since it is a static method. However I agree that we 
should not create a new event factory and instead modify isSelfEvent() to 
accomodate sync to latest event id flag. One issue that I see is - if event 
processing is disabled then MetastoreEventProcessor is not initialized and 
there would be no way to access eventFactory object from it.

Few ways to solve this:
1. Decouple MetastoreEventProcessor and EventFactory creation. In JniCatalog, 
we can create a common event factory that would be set in EventProcessor as 
well as used elsewhere. Doing so, we need to make sure that the factory is 
thread safe.

2. Make MetastoreEventFactory singleton and use it from all the places. This 
would avoid JniCatalog route. I had tried this approach in the past and 
encountered some test failures. Didn't investigate the failures in depth but it 
appeared to be race conditions issues. I can take a shot at it again if the 
approach seems cleaner.

Let me know your thoughts.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@316
PS19, Line 316:
> this is unused and can be removed.
Ack


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS6, Line 115:
> All the places where I see this variable getting accessed, I see that it is
As discussed over call, we will keep it as volatile so that EventProcessor can 
check if it needs to process an event or not without acquiring readlock on 
db/table.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@134
PS19, Line 134: LOG.debug("createEventId_ for db: {} set to: {}", 
getName(), createEventId_);
  : if (lastSyncedEventId_ < eventId) {
  :   setLastSyncedEventId(eventId);
  : }
> Pls remove if not needed anymore.
Sure. I had added this check earlier but then saw some failures in the tests 
related to ddls from Impala shell . For now, I will add a TODO comment and we 
can address it in a follow up jira.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@153
PS19, Line 153:   /**
  :* Creates a Db object with no tables based on the given 
TDatabase thrift struct.
  :*/
  :   public static Db fromTDatabase(TDatabase db) {
  : ret
> pls remove if not needed.
Same as previous comment


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186
PS12, Line 186:
> Thanks for adding the comment. But similar to the Db.java's volatile keywor
As discussed, we will keep the variable as volatile so that event processor can 
read it (to check if event should be skipped or not) w

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
..


Patch Set 37:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9644/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 22 Oct 2021 17:42:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

2021-10-22 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#37). ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
..

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
 asynchronously.

External behavior change:
  1. A new field with name "DDL execution mode:" is added to the
 summary section in the runtime profile, next to "DDL Type". This
 field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
 is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
 JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 386 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/37
--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
..


Patch Set 37:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7557/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 22 Oct 2021 17:21:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions

2021-10-22 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17960 )

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323
PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false;
> In this patch we set is_bound_by_parttion_columns for the computed partitio
I got it. Thanks for the explanation.

So in this case, maybe we can add a new field to TRuntimeFilterTargetDesc: 
is_data_in_data_file.

At line 678 in this file, we do the test as follows.

  if (IsBoundByPartitionColumn(idx) && !IsDataInDataFile(idx)) {
  continue;
}

 93 
  
 94 // Specification of a runtime filter target.   
 95 struct TRuntimeFilterTargetDesc { 
 96   // Target node id  
 97   1: Types.TPlanNodeId node_id  
 
 98 
 99   // Expr on which the filter is applied
100   2: required Exprs.TExpr target_expr   
   
101   
102   // Indicates if 'target_expr' is bound only by partition columns
103   3: required bool is_bound_by_partition_columns
104 
  
105   // Slot ids on which 'target_expr' is bound on 
106   4: required list target_expr_slotids
107 

108   // Indicates if this target is on the same fragment as the join that
109   // produced the runtime filter  
110   5: required bool is_local_target
111 
   
112   // If the target node is a Kudu scan node, the name, in the case it 
appears in Kudu, and
113   // type of the targeted column.  
114   6: optional string kudu_col_name  
 
115   7: optional Types.TColumnType kudu_col_type;
116  
117   // The low and high value as seen in the column stats of the targeted 
column.
118   8: optional Data.TColumnValue low_value 
119   9: optional Data.TColumnValue high_value 
120   
121   // Indicates if the low and high value in column stats are present

122   10: optional bool is_min_max_value_present   
123 }


http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@678
PS2, Line 678: && IsSimplePartitionedTable()
> For simple partitioned tables we don't want to evaluate the filters at the
make sense. Done



--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Oct 2021 16:45:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-22 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057
PS7, Line 2057: protected SelfEventContext getSelfEventContext() {
> @Sourabh
Makes sense. Thanks for the pointers.


http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080
PS7, Line 2080: NotificationEvent event) {
> As long as we have called addWriteEventInfo, this would be empty list even
got it !



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 16:20:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions

2021-10-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17960 )

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323
PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false;
> Ideally, is_computed_column can be a new boolean field to TColumn, together
In this patch we set is_bound_by_parttion_columns for the computed partition 
columns.


http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@678
PS2, Line 678: && IsSimplePartitionedTable()
> Since a computed partition column is a partition column, I wonder if there
For simple partitioned tables we don't want to evaluate the filters at the row 
group level as they are not stored in the parquet files. Hence we 'continue' 
here.

But for computed partitions we want to evaluate the filters since the columns 
are stored in the data files.



--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Oct 2021 15:25:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.

2021-10-22 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materilization of columns for filtered out 
rows in Parquet table.
..


Patch Set 10:

(10 comments)

Looks great!

On testing, I wonder if we can add a counter on # of rows (or amount of data) 
not surviving the materialization. This will be useful to safe guard the 
feature and demonstrate its usefulness.

http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch-test.cc
File be/src/exec/scratch-tuple-batch-test.cc:

http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch-test.cc@67
PS10, Line 67: // set every 16th row as selected.
I wonder if we can add two more tests for the following situations.

1. Clustered: over 1024 values, 200 consecutive are true and the rest is false;
2. interleaved: over 1024 values, randomly set 10%, 30%, and 70% to true.


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h
File be/src/exec/scratch-tuple-batch.h:

http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@29
PS10, Line 29: ScratchMicroBatch
May need a cstr to properly init these fields.


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@171
PS10, Line 171:   /// Consecutive bits set are used to create ranges. Ranges 
that differ by less than
nit (or micro batches).


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@172
PS10, Line 172:  E.g., for ranges 1-8, 11-20, 35-100 derived
  :   /// from 'selected_rows' and 'skip_length' as 10, first two 
ranges would be merged
  :   /// into 1-20 as they differ by 3 (11 - 8) which is less than 
10 ('skip_length').
This is wonderful.


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@176
PS10, Line 176: atleast
nit.


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@178
PS10, Line 178: range
nit. batch_idx may be a better name in this method.


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/scratch-tuple-batch.h@203
PS10, Line 203: DCHECK(start != -1) << "Atleast one of the 
'scratch_batch_->selected_rows'"
  : << "should be true";
nit. An alternative is the following, which is more robust.

if (start == -1) {
 return 0;
}


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/service/query-options.h@50
PS10, Line 50: PARQUET_MATERIALIZATION_THRESHOLD
nit: PARQUET_LATE_MATERIALIZATION_THRESHOLD?


http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/ImpalaService.thrift@701
PS10, Line 701:   // of columns in parquet
nit. -1 to turn off the feature.


http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/17860/10/common/thrift/Query.thrift@554
PS10, Line 554:   // of columns in parquet
nit. -1 to turn off the feature.



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Oct 2021 14:43:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table.

2021-10-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materilization of columns for filtered out 
rows in Parquet table.
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17860/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17860/10//COMMIT_MSG@19
PS10, Line 19: that
nit: than


http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17860/10/be/src/exec/parquet/hdfs-parquet-scanner.cc@2223
PS10, Line 2223: c.
Could you please explain where do we filter out the rows in the merged micro 
batches?



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Oct 2021 14:30:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9643/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 14:27:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7556/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 14:16:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2402
PS20, Line 2402:   batchEvents = eventFactory.createBatchEvents(mockEvents, 
eventsProcessor_.getMetrics());
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/20/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@85
PS20, Line 85: private static boolean flagEnableCatalogCache 
,flagInvalidateCache, flagSyncToLatestEventId;
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 14:06:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-22 Thread Sourabh Goyal (Code Review)
Hello Vihang Karajgaonkar, kis...@cloudera.com, Yu-Wen Lai, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17859

to look at the new patch set (#20).

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..

IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when 
performing
DDL operations via catalog HMS endpoints

Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java
A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java
A 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M tests/custom_cluster/test_metastore_service.py
26 files changed, 3,362 insertions(+), 282 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/20
--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions

2021-10-22 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17960 )

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323
PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false;
> Yeah, good question how could we make it a bit more elegant. Should I add a
Ideally, is_computed_column can be a new boolean field to TColumn, together 
with an expression on how the computation is done, such EXTRACT(DAY FROM 
timestamp_col).

For this patch, looks like we may not need that at all if we can set 
is_bound_by_partition_columns properly in FE even for the computed column.

120   bool IsBoundByPartitionColumn(int plan_id) const { 
121 int target_ndx = filter_desc().planid_to_target_ndx.at(plan_id);
122 return filter_desc().targets[target_ndx].is_bound_by_partition_columns; 

123   }


http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@678
PS2, Line 678: && IsSimplePartitionedTable()
Since a computed partition column is a partition column, I wonder if there is a 
need to make a test here?



--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Oct 2021 13:41:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7555/


--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 13:33:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP]: Initial commit to sync db/table to latest hms event id after executing ddls from Impala shell

2021-10-22 Thread Sourabh Goyal (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17904

to look at the new patch set (#3).

Change subject: [WIP]: Initial commit to sync db/table to latest hms event id 
after  executing ddls from Impala shell
..

[WIP]: Initial commit to sync db/table to latest hms event id after
executing ddls from Impala shell

Change-Id: I8af6b368f70a4cb587d2b961c059d55237b25c6c
---
M fe/src/main/java/org/apache/impala/catalog/Table.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 353 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17904/3
--
To view, visit http://gerrit.cloudera.org:8080/17904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8af6b368f70a4cb587d2b961c059d55237b25c6c
Gerrit-Change-Number: 17904
Gerrit-PatchSet: 3
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17960 )

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9642/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Oct 2021 10:14:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions

2021-10-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17960 )

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
..


Patch Set 2:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17960/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1323
PS1, Line 1323: if (scan_node_->hdfs_table()->IsIcebergTable()) return false;
> I wonder if FE is in the better position to make the decision.
Yeah, good question how could we make it a bit more elegant. Should I add a 
field to THdfsTable, e.g. 'has_computed_partitions'?

Maybe as long as we only have computed partitions + Parquet in Iceberg tables 
we can keep it as it is. Or do you have a better suggestion?


http://gerrit.cloudera.org:8080/#/c/17960/1/fe/src/main/java/org/apache/impala/catalog/FeTable.java
File fe/src/main/java/org/apache/impala/catalog/FeTable.java:

http://gerrit.cloudera.org:8080/#/c/17960/1/fe/src/main/java/org/apache/impala/catalog/FeTable.java@116
PS1, Line 116:
> nit. May rename it as isComputedPartitioColumn() as the computed column con
Done



--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Oct 2021 09:53:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10777: Enable min/max filtering for Iceberg partitions

2021-10-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17960

to look at the new patch set (#2).

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
..

IMPALA-10777: Enable min/max filtering for Iceberg partitions

This patch enables min/max filters for Iceberg columns that
participate in table partitioning. The min/max filters are
evaluated at the Parquet row group level. This means that it
is still slower than dynamic partition pruning (which doesn't
even need to open the files), but much faster than no pruning at all.

Performance

I used the following query to measure perf on a scale 10 TPC-DS
dataset:

 select i_item_id,sum(ss_ext_sales_price) total_sales
 from
 store_sales,
 date_dim,
  customer_address,
  item
 where i_item_id in (select
  i_item_id
 from item
 where i_color in ('orchid','chiffon','lace'))
  and ss_item_sk  = i_item_sk
  and ss_sold_date_sk = d_date_sk
  and d_year  = 2000
  and d_moy   = 1
  and ss_addr_sk  = ca_address_sk
  and ca_gmt_offset   = -8

The above query took the following times to execute:

Regular Parquet table: 1.16s
Iceberg table without min/max filters: 4.39s
Iceberg table with min/max filters: 1.77s

Testing:
 * added e2e test
 * planner test could not be added because Iceberg tables behave
   differently during planner tests (due to some hacks that needs
   refactoring)

Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
6 files changed, 45 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17960/2
--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
..


Patch Set 36: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7554/


--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 36
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 22 Oct 2021 08:06:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9641/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 07:41:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7555/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 22 Oct 2021 07:21:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-22 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
16 files changed, 928 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/8
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai