[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758821#comment-17758821
 ] 

Maxwell Guo commented on IMPALA-12402:
--

How can I assign this jira to myself ? I can't find the button.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
 Flags: Patch
Labels: pull-request-available  (was: )

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Summary: Add some configurations for CatalogdMetaProvider's cache_  (was: 
Add some configurations for CatalogMetaProvider's cache_)

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
  Language: java
Target Version: Impala 4.2.0

> Add some configurations for CatalogMetaProvider's cache_
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12402) Add some configurations for CatalogMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)
Maxwell Guo created IMPALA-12402:


 Summary: Add some configurations for CatalogMetaProvider's cache_
 Key: IMPALA-12402
 URL: https://issues.apache.org/jira/browse/IMPALA-12402
 Project: IMPALA
  Issue Type: Improvement
  Components: fe
Reporter: Maxwell Guo


when the cluster contains many db and tables such as if there are more than 
10 tables, and if we restart the impalad , the local cache_ 
CatalogMetaProvider's need to doing some loading process. 
As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
default. 
but if there is many tables the loading process will need more time and 
increase the probability of lock contention, see 
[here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
 
So we propose to add some configurations here, the first is the concurrency of 
cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12401) Support more info types for HS2 GetInfo() API

2023-08-24 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12401:
---

 Summary: Support more info types for HS2 GetInfo() API
 Key: IMPALA-12401
 URL: https://issues.apache.org/jira/browse/IMPALA-12401
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Quanlong Huang


Impalad coordinators can act as HiveServer2 since they implement the HS2 APIs.
Currently, we just support 3 info types for the HS2 GetInfo() API: 
CLI_SERVER_NAME, CLI_DBMS_NAME, CLI_DBMS_VER.
https://github.com/apache/impala/blob/11a9861ec695fe62b39095940514b28a8c684484/be/src/service/impala-hs2-server.cc#L468-L474

We can add more to be compatible with Hive, e.g. CLI_MAX_COLUMN_NAME_LEN, 
CLI_MAX_TABLE_NAME_LEN, CLI_MAX_SCHEMA_NAME_LEN, CLI_ODBC_KEYWORDS.
https://github.com/apache/hive/blob/4903585a34ae44bb3fec4207b5acab63f6bfc8c1/service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java#L501-L508

Note that CLI_ODBC_KEYWORDS is a new type of emun TGetInfoType added in 
HIVE-17765 which is not in our common/thrift/hive-1-api/TCLIService.thrift
We can add CLI_ODBC_KEYWORDS and other new types to our TCLIService.thrift file.

New tests can be added in tests/hs2/test_hs2.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10086) SqlCastException when comparing char with varchar

2023-08-24 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758776#comment-17758776
 ] 

Michael Smith commented on IMPALA-10086:


https://gerrit.cloudera.org/c/18001/

> SqlCastException when comparing char with varchar
> -
>
> Key: IMPALA-10086
> URL: https://issues.apache.org/jira/browse/IMPALA-10086
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Tim Armstrong
>Assignee: Bruno Pusztahazi
>Priority: Minor
>  Labels: newbie, ramp-up
>
> {noformat}
> [localhost:21000] default> select 'expected 2',count(*) from ax where cast(t 
> as string) = cast('a ' as varchar(10));
> +--+--+
> | 'expected 2' | count(*) |
> +--+--+
> | expected 2   | 2|
> +--+--+
> Fetched 1 row(s) in 0.44s
> [localhost:21000] default> create table chartbl (c char(10));
> +-+
> | summary |
> +-+
> | Table has been created. |
> +-+
> Fetched 1 row(s) in 0.23s
> [localhost:21000] default> select * from chartbl where c = cast('test' as 
> varchar(10));
> ERROR: SqlCastException: targetType=VARCHAR(*) type=VARCHAR(10)
> {noformat}
> Also using the functional dataset:
> {noformat}
> [localhost:21000] functional> select * from chars_tiny where cs = vc;
> ERROR: SqlCastException: targetType=VARCHAR(*) type=VARCHAR(5)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization

2023-08-24 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758774#comment-17758774
 ] 

Riza Suminto commented on IMPALA-12395:
---

Filed patch at: https://gerrit.cloudera.org/c/20406/

> Planner overestimates scan cardinality for queries using count star 
> optimization
> 
>
> Key: IMPALA-12395
> URL: https://issues.apache.org/jira/browse/IMPALA-12395
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Reporter: David Rorke
>Assignee: Riza Suminto
>Priority: Major
>
> The scan cardinality estimate for count(*) queries doesn't account for the 
> fact that the count(*) optimization only scans metadata and not the actual 
> columns.
> Scan for a count(*) query on Parquet store_sales:
>  
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak 
> Mem Detail 
> -
> 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB 
> tpcds_3000_string_parquet_managed.store_sales
> {noformat}
>  
> This is a problem with all file/table formats that implement count(*) 
> optimizations (Parquet and also probably ORC and Iceberg).
> This problem is more serious than it was in the past because with 
> IMPALA-12091 we now rely on scan cardinality estimates for executor group 
> assignments so count(*) queries are likely to get assigned to a larger 
> executor group than needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12400) Test expected executors used for planning when no executor groups are healthy

2023-08-24 Thread Abhishek Rawat (Jira)
Abhishek Rawat created IMPALA-12400:
---

 Summary: Test expected executors used for planning when no 
executor groups are healthy
 Key: IMPALA-12400
 URL: https://issues.apache.org/jira/browse/IMPALA-12400
 Project: IMPALA
  Issue Type: Test
Reporter: Abhishek Rawat


Planner uses expected executors from 'num_expected_executors'  and  '

'expected_executor_group_sets' config when no executor groups are healthy. 
Would be good to write a test case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12382) Coordinator could schedule fragments on gracefully shutdown executors

2023-08-24 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758687#comment-17758687
 ] 

Wenzhe Zhou edited comment on IMPALA-12382 at 8/24/23 6:38 PM:
---

If the executor is removed from the cluster membership by statestore when 
receiving un-registering request, it could affect running queries. Coordinators 
cancel the queries which are running on failed executors (as evidenced by their 
absence from the membership list). See 
[ImpalaServer::CancelQueriesOnFailedBackends()|https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L2365-L2375].

It seems we already have 
[mechanism|https://github.com/apache/impala/blob/master/be/src/service/impala-server.h#L124-L126]
 to avoid scheduling new task on the executors which are shutting down by 
marking the executor in "quiescing" state. 


was (Author: wzhou):
If the executor is removed from the cluster membership by statestore when 
receiving un-registering request, it could affect running queries. Coordinators 
cancel the queries which are running on failed executors (as evidenced by their 
absence from the membership list). See 
[ImpalaServer::CancelQueriesOnFailedBackends()|https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L2365-L2375].


> Coordinator could schedule fragments on gracefully shutdown executors
> -
>
> Key: IMPALA-12382
> URL: https://issues.apache.org/jira/browse/IMPALA-12382
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Abhishek Rawat
>Assignee: Wenzhe Zhou
>Priority: Critical
>
> Statestore does failure detection based on consecutive heartbeat failures. 
> This is by default configured to be 10 (statestore_max_missed_heartbeats) at 
> 1 second intervals (statestore_heartbeat_frequency_ms). This could however 
> take much longer than 10 seconds overall, especially if statestore is busy 
> and due to rpc timeout duration.
> In the following example it took 50 seconds for failure detection:
> {code:java}
> I0817 12:32:06.824721    86 statestore.cc:1157] Unable to send heartbeat 
> message to subscriber 
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
>  received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
> exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala18THeartbeatResponseE, send: done
> I0817 12:32:06.824741    86 failure-detector.cc:91] 1 consecutive heartbeats 
> failed for 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
>  State is OK
> .
> .
> .
> I0817 12:32:56.800251    83 statestore.cc:1157] Unable to send heartbeat 
> message to subscriber 
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
>  received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
> exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala18THeartbeatResponseE, send: done 
> I0817 12:32:56.800267    83 failure-detector.cc:91] 10 consecutive heartbeats 
> failed for 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
>  State is FAILED
> I0817 12:32:56.800276    83 statestore.cc:1168] Subscriber 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'
>  has failed, disconnected or re-registered (last known registration ID: 
> c84bf70f03acda2b:b34a812c5e96e687){code}
> As a result there is a window when statestore is determining node failure and 
> coordinator might schedule fragments on that particular executor(s). The exec 
> RPC will fail and if transparent query retries is enabled, coordinator will 
> immediately retry the query and it will fail again.
> Ideally in such situations coordinator should be notified sooner about a 
> failed executor. Statestore could send priority topic update to coordinator 
> when it enters failure detection logic. This should reduce the chances of 
> coordinator scheduling query fragment on a failed executor.
> The other argument could be to tune the heartbeat frequency and interval 
> parameters. But, it's hard to find configuration which works for all cases. 
> And, so while the default values are reasonable, under certain conditions 
> they could be unreasonable as seen in the above example.
> It might make sense to especially handle the case where executors are 
> shutdown gracefully and in such case statestore shouldn't do failure 
> detection and instead fail these executor immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12382) Coordinator could schedule fragments on gracefully shutdown executors

2023-08-24 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758687#comment-17758687
 ] 

Wenzhe Zhou commented on IMPALA-12382:
--

If the executor is removed from the cluster membership by statestore when 
receiving un-registering request, it could affect running queries. Coordinators 
cancel the queries which are running on failed executors (as evidenced by their 
absence from the membership list). See 
[ImpalaServer::CancelQueriesOnFailedBackends()|https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L2365-L2375].


> Coordinator could schedule fragments on gracefully shutdown executors
> -
>
> Key: IMPALA-12382
> URL: https://issues.apache.org/jira/browse/IMPALA-12382
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Abhishek Rawat
>Assignee: Wenzhe Zhou
>Priority: Critical
>
> Statestore does failure detection based on consecutive heartbeat failures. 
> This is by default configured to be 10 (statestore_max_missed_heartbeats) at 
> 1 second intervals (statestore_heartbeat_frequency_ms). This could however 
> take much longer than 10 seconds overall, especially if statestore is busy 
> and due to rpc timeout duration.
> In the following example it took 50 seconds for failure detection:
> {code:java}
> I0817 12:32:06.824721    86 statestore.cc:1157] Unable to send heartbeat 
> message to subscriber 
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
>  received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
> exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala18THeartbeatResponseE, send: done
> I0817 12:32:06.824741    86 failure-detector.cc:91] 1 consecutive heartbeats 
> failed for 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
>  State is OK
> .
> .
> .
> I0817 12:32:56.800251    83 statestore.cc:1157] Unable to send heartbeat 
> message to subscriber 
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
>  received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
> exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala18THeartbeatResponseE, send: done 
> I0817 12:32:56.800267    83 failure-detector.cc:91] 10 consecutive heartbeats 
> failed for 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
>  State is FAILED
> I0817 12:32:56.800276    83 statestore.cc:1168] Subscriber 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'
>  has failed, disconnected or re-registered (last known registration ID: 
> c84bf70f03acda2b:b34a812c5e96e687){code}
> As a result there is a window when statestore is determining node failure and 
> coordinator might schedule fragments on that particular executor(s). The exec 
> RPC will fail and if transparent query retries is enabled, coordinator will 
> immediately retry the query and it will fail again.
> Ideally in such situations coordinator should be notified sooner about a 
> failed executor. Statestore could send priority topic update to coordinator 
> when it enters failure detection logic. This should reduce the chances of 
> coordinator scheduling query fragment on a failed executor.
> The other argument could be to tune the heartbeat frequency and interval 
> parameters. But, it's hard to find configuration which works for all cases. 
> And, so while the default values are reasonable, under certain conditions 
> they could be unreasonable as seen in the above example.
> It might make sense to especially handle the case where executors are 
> shutdown gracefully and in such case statestore shouldn't do failure 
> detection and instead fail these executor immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11669) Make Thrift max message size configuration

2023-08-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758655#comment-17758655
 ] 

ASF subversion and git services commented on IMPALA-11669:
--

Commit 81844499b51da092567c510202a4b7de81ecd8af in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=81844499b ]

IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size

Thrift 0.16 implemented a limit on the max message size. In IMPALA-11669,
we added the thrift_rpc_max_message_size parameter and set the default
size to 1GB. Some existing clusters have needed to tune this parameter
higher because their workloads use message sizes larger than 1GB (e.g.
for metadata updates).

Historically, Impala has been able to send and receive 2GB messages,
so this changes the default value for thrift_rpc_max_message_size
to 2GB (INT_MAX). This can be reduced in future when Impala can guarantee
that messages work properly when split up into smaller batches.

TestGracefulShutdown::test_shutdown_idle started failing with this
change, because it is producing a different error message for one
of the negative tests. ClientRequestState::ExecShutdownRequest()
appends some extra explanation when it sees a "Network error" KRPC error,
and the test expects that extra explanation. This modifies
ClientRequestState::ExecShutdownRequest() to provide the extra explanation
for the new error ("Timed out") as well.

Testing:
 - Ran GVO

Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Reviewed-on: http://gerrit.cloudera.org:8080/20394
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
Reviewed-by: Michael Smith 


> Make Thrift max message size configuration
> --
>
> Key: IMPALA-11669
> URL: https://issues.apache.org/jira/browse/IMPALA-11669
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Assignee: Riza Suminto
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> With the upgrade to Thrift 0.16, Thrift now has a protection against 
> malicious message in the form of a maximum size for messages. This is 
> currently set to 100MB by default. Impala should add the ability to override 
> this default value. In particular, it seems like communication between 
> coordinators and the catalogd may need a larger value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11957) Implement Regression functions : regr_slope(), regr_intercept() and regr_r2()

2023-08-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758653#comment-17758653
 ] 

ASF subversion and git services commented on IMPALA-11957:
--

Commit 20a9d2669c69f8e5b0a5c0b9487fa0212a00ad9c in impala's branch 
refs/heads/master from pranav.lodha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=20a9d2669 ]

IMPALA-11957: Implement Regression functions: regr_slope(),
regr_intercept() and regr_r2()

The linear regression functions fit an ordinary-least-squares regression
line to a set of number pairs. They can be used both as aggregate and
analytic functions.

regr_slope() takes two arguments of numeric type and returns the slope
of the line.
regr_intercept() takes two arguments of numeric type and returns the
y-intercept of the regression line.
regr_r2() takes two arguments of numeric type and returns the
coefficient of determination (also called R-squared or goodness of fit)
for the regression.

Testing:
The functions are extensively tested and cross-checked with Hive. The
tests can be found in aggregation.test.
Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Reviewed-on: http://gerrit.cloudera.org:8080/19569
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Implement Regression functions : regr_slope(), regr_intercept() and regr_r2()
> -
>
> Key: IMPALA-11957
> URL: https://issues.apache.org/jira/browse/IMPALA-11957
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Pranav Yogi Lodha
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> The linear regression functions fit an ordinary-least-squares regression line
> to a set of number pairs which can be used both as aggregate and analytic 
> functions.
>  * regr_slope() takes two arguments of numeric type and returns the slope of 
> the line.
>  * regr_intercept() takes two arguments of numeric type and returns the 
> y-intercept of the regression line.
>  * regr_r2() takes two arguments of numeric type and returns the coefficient 
> of determination (also called R-squared or goodness of fit) for the 
> regression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12390) Enable performance related clang-tidy checks

2023-08-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758657#comment-17758657
 ] 

ASF subversion and git services commented on IMPALA-12390:
--

Commit d96341ed537a3e321d5fa6a0235ab06b5d9169a2 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d96341ed5 ]

IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

Currently, DictEncoder uses the default hash function for
TimestampValue, which means it is hashing the entire
TimestampValue struct. This can be inconsistent, because
TimestampValue contains some padding that may not be zero
in some cases. For TimestampValues that are part of a Tuple,
the padding is zero, so this is mainly present in test cases.

This was discovered when fixing a Clang Tidy performance-for-range-copy
warning by iterating with a const reference rather than
making a copy of the value. DictTest.TestTimestamps became
flaky with that change, because the hash was no longer
consistent. The copy must have had consistent content for
the padding through the iteration, but the const reference
did not.

This adds a template specialization of the Hash function
for TimestampValue. The specialization uses TimestampValue::Hash(),
which hashes only the non-padding pieces of the struct. This
also includes the change to dict-test.cc that uncovered the
issue. This fix is mostly to unblock IMPALA-12390.

Testing:
 - Ran dict-test in a loop for a few hundred iterations
 - Hand tested inserting many timestamps into a Parquet table
   with dictionary encoding and verified that the performance didn't
   change.

Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Reviewed-on: http://gerrit.cloudera.org:8080/20396
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
Reviewed-by: Michael Smith 


> Enable performance related clang-tidy checks
> 
>
> Key: IMPALA-12390
> URL: https://issues.apache.org/jira/browse/IMPALA-12390
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> clang-tidy has several performance-related checks that seem like they would 
> be useful to enforce. Here are some examples:
> {noformat}
> /home/joemcdonnell/upstream/Impala/be/src/runtime/types.h:313:25: warning: 
> loop variable is copied but only used as const reference; consider making it 
> a const reference [performance-for-range-copy]
>         for (ColumnType child_type : col_type.children) {
>              ~~ ^
>              const &
> /home/joemcdonnell/upstream/Impala/be/src/catalog/catalog-util.cc:168:34: 
> warning: 'find' called with a string literal consisting of a single 
> character; consider using the more effective overload accepting a character 
> [performance-faster-string-find]
>       int pos = object_name.find(".");
>                                  ^~~~
>                                  '.'
> /home/joemcdonnell/upstream/Impala/be/src/util/decimal-util.h:55:53: warning: 
> the parameter 'b' is copied for each invocation but only used as a const 
> reference; consider making it a const reference 
> [performance-unnecessary-value-param]
>   static int256_t SafeMultiply(int256_t a, int256_t b, bool may_overflow) {
>                                             ^
>                                            const &
> /home/joemcdonnell/upstream/Impala/be/src/codegen/llvm-codegen.cc:847:5: 
> warning: 'push_back' is called inside a loop; consider pre-allocating the 
> vector capacity before the loop [performance-inefficient-vector-operation]
>     arguments.push_back(args_[i].type);
>     ^{noformat}
> In all, they seem to flag things that developers wouldn't ordinarily notice, 
> and it doesn't seem to have too many false positives. We should look into 
> enabling these.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12366) If Thrift messages are between 1GB and 2GB, the max message size will trigger

2023-08-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758654#comment-17758654
 ] 

ASF subversion and git services commented on IMPALA-12366:
--

Commit 81844499b51da092567c510202a4b7de81ecd8af in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=81844499b ]

IMPALA-12366: Use 2GB as the default for thrift_rpc_max_message_size

Thrift 0.16 implemented a limit on the max message size. In IMPALA-11669,
we added the thrift_rpc_max_message_size parameter and set the default
size to 1GB. Some existing clusters have needed to tune this parameter
higher because their workloads use message sizes larger than 1GB (e.g.
for metadata updates).

Historically, Impala has been able to send and receive 2GB messages,
so this changes the default value for thrift_rpc_max_message_size
to 2GB (INT_MAX). This can be reduced in future when Impala can guarantee
that messages work properly when split up into smaller batches.

TestGracefulShutdown::test_shutdown_idle started failing with this
change, because it is producing a different error message for one
of the negative tests. ClientRequestState::ExecShutdownRequest()
appends some extra explanation when it sees a "Network error" KRPC error,
and the test expects that extra explanation. This modifies
ClientRequestState::ExecShutdownRequest() to provide the extra explanation
for the new error ("Timed out") as well.

Testing:
 - Ran GVO

Change-Id: Ib624201b683966a9feefb8fe45985f3d52d869fc
Reviewed-on: http://gerrit.cloudera.org:8080/20394
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
Reviewed-by: Michael Smith 


> If Thrift messages are between 1GB and 2GB, the max message size will trigger
> -
>
> Key: IMPALA-12366
> URL: https://issues.apache.org/jira/browse/IMPALA-12366
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> In a user cluster, we ran into a circumstance where a Thrift message was 
> greater than 1GB (which is the value for thrift_rpc_max_message_size). The 
> issue was alleviated by changing the value of thrift_rpc_max_message_size to 
> 32-bit int max (~2GB). We may want to simply ship with 
> thrift_rpc_max_message_size=2GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758656#comment-17758656
 ] 

ASF subversion and git services commented on IMPALA-12393:
--

Commit d96341ed537a3e321d5fa6a0235ab06b5d9169a2 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d96341ed5 ]

IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

Currently, DictEncoder uses the default hash function for
TimestampValue, which means it is hashing the entire
TimestampValue struct. This can be inconsistent, because
TimestampValue contains some padding that may not be zero
in some cases. For TimestampValues that are part of a Tuple,
the padding is zero, so this is mainly present in test cases.

This was discovered when fixing a Clang Tidy performance-for-range-copy
warning by iterating with a const reference rather than
making a copy of the value. DictTest.TestTimestamps became
flaky with that change, because the hash was no longer
consistent. The copy must have had consistent content for
the padding through the iteration, but the const reference
did not.

This adds a template specialization of the Hash function
for TimestampValue. The specialization uses TimestampValue::Hash(),
which hashes only the non-padding pieces of the struct. This
also includes the change to dict-test.cc that uncovered the
issue. This fix is mostly to unblock IMPALA-12390.

Testing:
 - Ran dict-test in a loop for a few hundred iterations
 - Hand tested inserting many timestamps into a Parquet table
   with dictionary encoding and verified that the performance didn't
   change.

Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Reviewed-on: http://gerrit.cloudera.org:8080/20396
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
Reviewed-by: Michael Smith 


> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(&value, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(&pool, fixed_buffer_byte_size, 
> &track_encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12366) If Thrift messages are between 1GB and 2GB, the max message size will trigger

2023-08-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12366.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> If Thrift messages are between 1GB and 2GB, the max message size will trigger
> -
>
> Key: IMPALA-12366
> URL: https://issues.apache.org/jira/browse/IMPALA-12366
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> In a user cluster, we ran into a circumstance where a Thrift message was 
> greater than 1GB (which is the value for thrift_rpc_max_message_size). The 
> issue was alleviated by changing the value of thrift_rpc_max_message_size to 
> 32-bit int max (~2GB). We may want to simply ship with 
> thrift_rpc_max_message_size=2GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-24 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12393.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(&value, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(&pool, fixed_buffer_byte_size, 
> &track_encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11460) Enable ASYNC CODEGEN by default

2023-08-24 Thread Daniel Becker (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758551#comment-17758551
 ] 

Daniel Becker commented on IMPALA-11460:


I ran a benchmark and added the results in  {{{}Async_benchmark.txt{}}}.

CodeGen cache was turned off for both the async and the sync case. It is a TCPH 
benchmark with scale factor 2. The small scale factor was chosen because async 
codegen is most useful for small, fast queries.

Overall the benchmark shows significant improvement (-28.65%) but TPCH-Q1 had a 
regression of +7.87%.

> Enable ASYNC CODEGEN by default
> ---
>
> Key: IMPALA-11460
> URL: https://issues.apache.org/jira/browse/IMPALA-11460
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Abhishek Rawat
>Priority: Major
> Attachments: Async_benchmark.txt
>
>
> Would be good to do some additional testing and address any gaps and enable 
> the feature by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11460) Enable ASYNC CODEGEN by default

2023-08-24 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-11460:
---
Attachment: Async_benchmark.txt

> Enable ASYNC CODEGEN by default
> ---
>
> Key: IMPALA-11460
> URL: https://issues.apache.org/jira/browse/IMPALA-11460
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Abhishek Rawat
>Priority: Major
> Attachments: Async_benchmark.txt
>
>
> Would be good to do some additional testing and address any gaps and enable 
> the feature by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option

2023-08-24 Thread Daniel Becker (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758478#comment-17758478
 ] 

Daniel Becker commented on IMPALA-5081:
---

Having a way to invalidate the cache would possibly be useful in testing also. 
How difficult would it be to do it? It it's complicated or opens up the 
possibility for subtle errors we should not do it.

> Expose IR optimization level via query option
> -
>
> Key: IMPALA-5081
> URL: https://issues.apache.org/jira/browse/IMPALA-5081
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
>
> Certain queries may spend a lot of time in the IR optimization. Currently, 
> there is a start-up option to disable optimization in LLVM. However, it may 
> be of inconvenience to users to have to restart the entire Impala cluster to 
> just use that option. This JIRA aims at exploring exposing a query option for 
> users to choose the optimization level for a given query (e.g. we can have a 
> level which just only have a dead code elimination pass or no optimization at 
> all).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12399) Pass eventTypeSkipList with OPEN_TXN in NotificationEventRequest to avoid receiving OPEN_TXN events from HMS

2023-08-24 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12399:

Epic Link: IMPALA-11532

> Pass eventTypeSkipList with OPEN_TXN in NotificationEventRequest to avoid 
> receiving OPEN_TXN events from HMS
> 
>
> Key: IMPALA-12399
> URL: https://issues.apache.org/jira/browse/IMPALA-12399
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Priority: Major
>
> Notification events like OPEN_TXN are ignored on catalogd 
> {{{}MetastoreEventsProcessor{}}}. So, we can pass eventTypeSkipList with 
> OPEN_TXN in NotificationEventRequest while invoking get_next_notification() 
> to avoid reading such notification messages from HMS and then ignoring on 
> catalogd. OPEN_TXN event being more frequent(received even upon describe 
> table operation from beeline), we can significantly reduce unwanted 
> processing on both HMS and catalogd. Catalogd reads events in batches of 
> EVENTS_BATCH_SIZE_PER_RPC, skipping such unnecessary events can help catchup 
> the events faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12399) Pass eventTypeSkipList with OPEN_TXN in NotificationEventRequest to avoid receiving OPEN_TXN events from HMS

2023-08-24 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated IMPALA-12399:
---
Description: Notification events like OPEN_TXN are ignored on catalogd 
{{{}MetastoreEventsProcessor{}}}. So, we can pass eventTypeSkipList with 
OPEN_TXN in NotificationEventRequest while invoking get_next_notification() to 
avoid reading such notification messages from HMS and then ignoring on 
catalogd. OPEN_TXN event being more frequent(received even upon describe table 
operation from beeline), we can significantly reduce unwanted processing on 
both HMS and catalogd. Catalogd reads events in batches of 
EVENTS_BATCH_SIZE_PER_RPC, skipping such unnecessary events can help catchup 
the events faster.  (was: Notification events like OPEN_TXN are ignored on 
catalogd. So, we can pass eventTypeSkipList with OPEN_TXN in 
NotificationEventRequest while invoking get_next_notification() to avoid 
reading such notification messages from HMS and then ignoring on catalogd. 
OPEN_TXN event being more frequent(received even upon describe table operation 
from beeline), we can significantly reduce unwanted processing on both HMS and 
catalogd. Catalogd reads events in batches of EVENTS_BATCH_SIZE_PER_RPC, 
skipping such unnecessary events can help catchup the events faster.)

> Pass eventTypeSkipList with OPEN_TXN in NotificationEventRequest to avoid 
> receiving OPEN_TXN events from HMS
> 
>
> Key: IMPALA-12399
> URL: https://issues.apache.org/jira/browse/IMPALA-12399
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Priority: Major
>
> Notification events like OPEN_TXN are ignored on catalogd 
> {{{}MetastoreEventsProcessor{}}}. So, we can pass eventTypeSkipList with 
> OPEN_TXN in NotificationEventRequest while invoking get_next_notification() 
> to avoid reading such notification messages from HMS and then ignoring on 
> catalogd. OPEN_TXN event being more frequent(received even upon describe 
> table operation from beeline), we can significantly reduce unwanted 
> processing on both HMS and catalogd. Catalogd reads events in batches of 
> EVENTS_BATCH_SIZE_PER_RPC, skipping such unnecessary events can help catchup 
> the events faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12399) Pass eventTypeSkipList with OPEN_TXN in NotificationEventRequest to avoid receiving OPEN_TXN events from HMS

2023-08-24 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created IMPALA-12399:
--

 Summary: Pass eventTypeSkipList with OPEN_TXN in 
NotificationEventRequest to avoid receiving OPEN_TXN events from HMS
 Key: IMPALA-12399
 URL: https://issues.apache.org/jira/browse/IMPALA-12399
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Venugopal Reddy K


Notification events like OPEN_TXN are ignored on catalogd. So, we can pass 
eventTypeSkipList with OPEN_TXN in NotificationEventRequest while invoking 
get_next_notification() to avoid reading such notification messages from HMS 
and then ignoring on catalogd. OPEN_TXN event being more frequent(received even 
upon describe table operation from beeline), we can significantly reduce 
unwanted processing on both HMS and catalogd. Catalogd reads events in batches 
of EVENTS_BATCH_SIZE_PER_RPC, skipping such unnecessary events can help catchup 
the events faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org