[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-06-25 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859856#comment-17859856
 ] 

Maxwell Guo commented on IMPALA-12771:
--

ping [~stigahuang][~mylogi...@gmail.com]

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-05-30 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850908#comment-17850908
 ] 

Maxwell Guo commented on IMPALA-12771:
--

ping again , and update the pr for the latest master branch code in case of 
merge conflict.  [~mylogi...@gmail.com][~stigahuang][~VenuReddy]

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-04-09 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835458#comment-17835458
 ] 

Maxwell Guo commented on IMPALA-12771:
--

ping [~mylogi...@gmail.com][~stigahuang][~VenuReddy]:D

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-04-09 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835459#comment-17835459
 ] 

Maxwell Guo commented on IMPALA-12709:
--

[~VenuReddy] any update here ?:D

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-04-02 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833178#comment-17833178
 ] 

Maxwell Guo commented on IMPALA-12709:
--

[~VenuReddy] Thanks very much , looking forward to your update. 

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-04-01 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833010#comment-17833010
 ] 

Maxwell Guo commented on IMPALA-12771:
--

hi [~hemanth619], thanks for your review comments, I have responded to your 
comments and updated the latest code. Looking forward to your reply. :)

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-03-26 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831139#comment-17831139
 ] 

Maxwell Guo commented on IMPALA-12709:
--

[~VenuReddy] Sorry to bother you again , is there any update on this ? :)

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12912) Show history of event processing in the /events page

2024-03-21 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo reassigned IMPALA-12912:


Assignee: Maxwell Guo

> Show history of event processing in the /events page
> 
>
> Key: IMPALA-12912
> URL: https://issues.apache.org/jira/browse/IMPALA-12912
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Maxwell Guo
>Priority: Major
>
> This is a follow-up task of IMPALA-12782 where we add some basic info in the 
> /events page. It'd be helpful to also show the history of event processing, 
> including the top-10 expensive events/tables, the recent 10 failure messages, 
> etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12912) Show history of event processing in the /events page

2024-03-21 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829730#comment-17829730
 ] 

Maxwell Guo commented on IMPALA-12912:
--

[~stigahuang] Can I assign this issue to me ? 

> Show history of event processing in the /events page
> 
>
> Key: IMPALA-12912
> URL: https://issues.apache.org/jira/browse/IMPALA-12912
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Quanlong Huang
>Priority: Major
>
> This is a follow-up task of IMPALA-12782 where we add some basic info in the 
> /events page. It'd be helpful to also show the history of event processing, 
> including the top-10 expensive events/tables, the recent 10 failure messages, 
> etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-03-18 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828156#comment-17828156
 ] 

Maxwell Guo commented on IMPALA-12771:
--

ping [~stigahuang][~mylogi...@gmail.com] [~VenuReddy]:D

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-03-13 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826933#comment-17826933
 ] 

Maxwell Guo commented on IMPALA-12771:
--

[~stigahuang][~mylogi...@gmail.com][~VenuReddy]
Hi  can  you help to take a look again ? 

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus

2024-03-12 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825579#comment-17825579
 ] 

Maxwell Guo commented on IMPALA-12468:
--

So if there is any need for me to continue working on this jira ? 


> Add the ability to update EventProcessorStatus
> --
>
> Key: IMPALA-12468
> URL: https://issues.apache.org/jira/browse/IMPALA-12468
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be, Catalog, fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> Once the impala and hive's status is missmatched , and the 
> EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate 
> metadata to reset the catalog instance. And then impala will update the 
> status to ACTIVE . 
> But if impala contains many tables , the cost of invalidate is a bit high for 
> a global invalidate. So we may invalidate metadata for tables one by one for 
> these incremental changed table. For example , we have 1000,000,000,000 
> tables but only some of the table event process occurs CatalogException and 
> MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
> need to invalidate all table caches in order to reset the catalog instance 
> see [here 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
>  
> MetaStoresProcessor 's async update process will not update the currentStatus 
> when the status is not ACTIVE, see 
> [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]
> So what about add a new SQL grammar : SET EVENT STATUS ${status}  ?  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-03-12 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825575#comment-17825575
 ] 

Maxwell Guo commented on IMPALA-12709:
--

Thanks [~VenuReddy]

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus

2024-03-12 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825571#comment-17825571
 ] 

Maxwell Guo commented on IMPALA-12468:
--

The reason why I create the jira is that we meet the problem of IMPALA-12832 
and we have made some code modification just like IMPALA-12832 but we only 
dealt with the situation of NEEDS_INVALIDATE。
 After looking at the code of impala, we found that the ERROR status may be 
different NEED_INVALIDATE. For example, we have found that impala's schema is 
not compatible  with hive (but this have been fixed in the new impala version), 
it may be necessary to ask the HMS side to do some table operations, so that 
the EP can deal with this table correctly . Therefore, in this case, I thought 
of manually resetting the EP status after user modifying their schema.
 

> Add the ability to update EventProcessorStatus
> --
>
> Key: IMPALA-12468
> URL: https://issues.apache.org/jira/browse/IMPALA-12468
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be, Catalog, fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> Once the impala and hive's status is missmatched , and the 
> EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate 
> metadata to reset the catalog instance. And then impala will update the 
> status to ACTIVE . 
> But if impala contains many tables , the cost of invalidate is a bit high for 
> a global invalidate. So we may invalidate metadata for tables one by one for 
> these incremental changed table. For example , we have 1000,000,000,000 
> tables but only some of the table event process occurs CatalogException and 
> MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
> need to invalidate all table caches in order to reset the catalog instance 
> see [here 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
>  
> MetaStoresProcessor 's async update process will not update the currentStatus 
> when the status is not ACTIVE, see 
> [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]
> So what about add a new SQL grammar : SET EVENT STATUS ${status}  ?  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-03-11 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825340#comment-17825340
 ] 

Maxwell Guo commented on IMPALA-12709:
--

Is there any update on this issue  ? [~VenuReddy][~rizaon][~stigahuang]

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus

2024-03-03 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823015#comment-17823015
 ] 

Maxwell Guo commented on IMPALA-12468:
--

Hi  [~mylogi...@gmail.com] , 
Yeah , what you said is another solution for this problem, but I want to make 
this feature more general, as there are other event statuses  for EP , not only 
NEEDD_INVALIDATE. What I want to do is let this state have room for manual 
adjustment, you know that any status that is not ACTIVE will stop the process 
of EP, and then the only way we can do is restart or invalidate all for 
catalogd.



> Add the ability to update EventProcessorStatus
> --
>
> Key: IMPALA-12468
> URL: https://issues.apache.org/jira/browse/IMPALA-12468
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be, Catalog, fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> Once the impala and hive's status is missmatched , and the 
> EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate 
> metadata to reset the catalog instance. And then impala will update the 
> status to ACTIVE . 
> But if impala contains many tables , the cost of invalidate is a bit high for 
> a global invalidate. So we may invalidate metadata for tables one by one for 
> these incremental changed table. For example , we have 1000,000,000,000 
> tables but only some of the table event process occurs CatalogException and 
> MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
> need to invalidate all table caches in order to reset the catalog instance 
> see [here 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
>  
> MetaStoresProcessor 's async update process will not update the currentStatus 
> when the status is not ACTIVE, see 
> [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]
> So what about add a new SQL grammar : SET EVENT STATUS ${status}  ?  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-02-25 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820535#comment-17820535
 ] 

Maxwell Guo commented on IMPALA-12771:
--

The build  failed, let me take a look at this case.

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12573) Give configuration load_catalog_in_background more fine-grained configuration

2024-02-21 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819452#comment-17819452
 ] 

Maxwell Guo commented on IMPALA-12573:
--

[~stigahuang]Thanks for your reply. I think load_dbs_in_background and 
load_tables_in_background may help. 

But maybe I didn't understand your expression clearly, does this two 
configurations should be string that just like table/db black list , if some 
one want some tables be always loaded. These configurations are not going to be 
boolean flags , am I right ? 


> Give configuration load_catalog_in_background more fine-grained configuration
> -
>
> Key: IMPALA-12573
> URL: https://issues.apache.org/jira/browse/IMPALA-12573
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> As we know if  load_catalog_in_background set to true, then the table meta 
> will load async for catalogd.
> During this period when catalogd starts up, if the flag set to true, then all 
> the table will load async, then the queue will be big . So we may left it to 
> false by deafult. But if we invalidate some table manually ,we may want them 
> to load . So I think we can introduce a new flag 
> load_catalog_in_background_at_startup , we can set 
> load_catalog_in_background_at_startup to false, and 
> load_catalog_in_background to true by default. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-02-21 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819210#comment-17819210
 ] 

Maxwell Guo commented on IMPALA-12771:
--

done , publish it now

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-02-21 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819161#comment-17819161
 ] 

Maxwell Guo edited comment on IMPALA-12771 at 2/21/24 9:48 AM:
---

The initial version is here
 https://gerrit.cloudera.org/#/c/21045/
and I am doing local testing at the same time. 

CC [~mylogi...@gmail.com][~stigahuang][~VenuReddy] ,let me know if there is 
something obviously wrong with my modifications.


was (Author: maxwellguo):
The initial version is here https://gerrit.cloudera.org/#/c/21045/
and I am doing local testing at the same time. 

CC [~mylogi...@gmail.com][~stigahuang][~VenuReddy] ,let me know if there is 
something obviously wrong with my modifications.

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-02-21 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819161#comment-17819161
 ] 

Maxwell Guo commented on IMPALA-12771:
--

The initial version is here https://gerrit.cloudera.org/#/c/21045/
and I am doing local testing at the same time. 

CC [~mylogi...@gmail.com][~stigahuang][~VenuReddy] ,let me know if there is 
something obviously wrong with my modifications.

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing

2024-02-19 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818229#comment-17818229
 ] 

Maxwell Guo edited comment on IMPALA-12709 at 2/19/24 11:24 AM:


Hi [~VenuReddy] ,After reading the code ,I only found 
[EventsProcessorStressTest 
title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java]
 which may has some relations with performance, But I think some function 
customization is required if we want to use the code.  [~stigahuang] 
[~mylogi...@gmail.com] any more suggestions?
Besides, What about make this patch configurable, one of the benefits is that 
you can visually see the comparison results through configuration without 
changing this code, and 
I think new features are generally turned off by default. 


was (Author: maxwellguo):
Hi [~VenuReddy] ,After reading the code ,I only found 
[EventsProcessorStressTest 
title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java]
 which may has some relations with performance, But I think some function 
customization is required if we want to  the code.  [~stigahuang] 
[~mylogi...@gmail.com] any more suggestions?
Besides, What about make this patch configurable, one of the benefits is that 
you can visually see the comparison results through configuration without 
changing this code, and 
I think new features are generally turned off by default. 

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing

2024-02-19 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818229#comment-17818229
 ] 

Maxwell Guo edited comment on IMPALA-12709 at 2/19/24 11:24 AM:


Hi [~VenuReddy] ,After reading the code ,I only found 
[EventsProcessorStressTest 
title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java]
 which may has some relations with performance, But I think some function 
customization is required if we want to  the code.  [~stigahuang] 
[~mylogi...@gmail.com] any more suggestions?
Besides, What about make this patch configurable, one of the benefits is that 
you can visually see the comparison results through configuration without 
changing this code, and 
I think new features are generally turned off by default. 


was (Author: maxwellguo):
Hi [~VenuReddy] ,After reading the code ,I only found 
[EventsProcessorStressTest 
title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java]
 which may has some relations with performance, But I think some function 
customization is required if we want to you the code.  [~stigahuang] 
[~mylogi...@gmail.com] any more suggestions?
Besides, What about make this patch configurable, one of the benefits is that 
you can visually see the comparison results through configuration without 
changing this code, and 
I think new features are generally turned off by default. 

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-02-19 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12771:
-
Description: 
See the description of [event-skipped 
metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
 

{code:java}
 // total number of events which are skipped because of the flag setting or
  // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
ignored
  // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in the 
catalogd.
{code}
 
As for CREATE and DROP event on Database/Table/Partition (Also AddPartition is 
inclued) when we found that the table/database when the database or table is 
not found in the cache then we will skip the event process and make the 
event-skipped metric +1.
But I found that there is some question here for alter table and Reload event:

* For Reload event that is not describe in the description of events-skipped, 
but the value is +1 when is oldevent;
* Besides if the table is in blacklist the metric will also +1
In summary, I think this description is inconsistent with the actual 
implementation.
So can we also mark the events-skipped metric for alter partition events and 
modify the 
description  to be all the events skipped 

  was:
See the description of [event-skipped 
metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
 

{code:java}
 // total number of events which are skipped because of the flag setting or
  // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
ignored
  // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in the 
catalogd.
{code}
 
As for CREATE and DROP event on Database/Table/Partition (Also AddPartition is 
inclued) when we found that the table/database when the database or table is 
not found in the cache then we will skip the event process and make the 
event-skipped metric +1.
But I found that there is some question here for alter table and Reload event:

* For alter table if renaming a table , the events-skipped  metric will also +1 
,see [oldTblRemoved to be false 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
* For Reload event that is not describe in the description of events-skipped, 
but the value is +1 when is oldevent;
* Besides if the table is in blacklist the metric will also +1
In summary, I think this description is inconsistent with the actual 
implementation.
So can we also mark the events-skipped metric for alter partition events and 
modify the 
description  to be all the events skipped 


> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-02-18 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12771 started by Maxwell Guo.

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For alter table if renaming a table , the events-skipped  metric will also 
> +1 ,see [oldTblRemoved to be false 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-02-17 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818229#comment-17818229
 ] 

Maxwell Guo commented on IMPALA-12709:
--

Hi [~VenuReddy] ,After reading the code ,I only found 
[EventsProcessorStressTest 
title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java]
 which may has some relations with performance, But I think some function 
customization is required if we want to you the code.  [~stigahuang] 
[~mylogi...@gmail.com] any more suggestions?
Besides, What about make this patch configurable, one of the benefits is that 
you can visually see the comparison results through configuration without 
changing this code, and 
I think new features are generally turned off by default. 

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-01-30 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812553#comment-17812553
 ] 

Maxwell Guo commented on IMPALA-12771:
--

Thanks for reminding me, I think I have used it before

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For alter table if renaming a table , the events-skipped  metric will also 
> +1 ,see [oldTblRemoved to be false 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-01-30 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812236#comment-17812236
 ] 

Maxwell Guo commented on IMPALA-12771:
--

Besides, I found an interesting piece of code, [ here 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1023]
as the TableLoadingException and DatabaseNotFoundException is catched in the 
method [here 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1015]
 and the inner function of 
[reloadTableIfExists|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2902]
 has already catched the exception and the function do not re-throw the  
exception out so , the outside function has no need to deal with these two 
exception in my mind. And I think it is not suitable to print an info level log 
[here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2903].
 Warn level logging is better.

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For alter table if renaming a table , the events-skipped  metric will also 
> +1 ,see [oldTblRemoved to be false 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-01-30 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812237#comment-17812237
 ] 

Maxwell Guo commented on IMPALA-12771:
--

If you think my suggestion is reasonable, I will submit a PR later

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For alter table if renaming a table , the events-skipped  metric will also 
> +1 ,see [oldTblRemoved to be false 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-01-30 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812215#comment-17812215
 ] 

Maxwell Guo commented on IMPALA-12771:
--

ping [~huangqiang] 

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For alter table if renaming a table , the events-skipped  metric will also 
> +1 ,see [oldTblRemoved to be false 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-01-30 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812214#comment-17812214
 ] 

Maxwell Guo commented on IMPALA-12771:
--

with alter parition event, we may found that database/table is not found , or 
table is not loaded see [here 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1057].
 then we just skipped the and  made a debug log , the log is : " Ignoring the 
event since  the table is not found ". 
But actually, we skipped the handling of these events when table is not found 
.So I think we can also +1 on the events-skipped metric if table is not found  
or table is IncompleteTable or table was remove in catalog. 

Besides, we just mark the events-skipped metric for event process with 
isOlderEvent method and isSelfEvent see 
[isSelfEvent|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L857]
 and 
[isOlderEvent|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1198]
But as for 
[canBeSkipped|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1703]
 method, the metric is not +1. I think we can also add here.
 

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For alter table if renaming a table , the events-skipped  metric will also 
> +1 ,see [oldTblRemoved to be false 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-01-30 Thread Maxwell Guo (Jira)
Maxwell Guo created IMPALA-12771:


 Summary: Impala catalogd events-skipped may mark the wrong number
 Key: IMPALA-12771
 URL: https://issues.apache.org/jira/browse/IMPALA-12771
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Maxwell Guo
Assignee: Maxwell Guo


See the description of [event-skipped 
metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
 

{code:java}
 // total number of events which are skipped because of the flag setting or
  // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
ignored
  // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in the 
catalogd.
{code}
 
As for CREATE and DROP event on Database/Table/Partition (Also AddPartition is 
inclued) when we found that the table/database when the database or table is 
not found in the cache then we will skip the event process and make the 
event-skipped metric +1.
But I found that there is some question here for alter table and Reload event:

* For alter table if renaming a table , the events-skipped  metric will also +1 
,see [oldTblRemoved to be false 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653]
* For Reload event that is not describe in the description of events-skipped, 
but the value is +1 when is oldevent;
* Besides if the table is in blacklist the metric will also +1
In summary, I think this description is inconsistent with the actual 
implementation.
So can we also mark the events-skipped metric for alter partition events and 
modify the 
description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-01-25 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810960#comment-17810960
 ] 

Maxwell Guo commented on IMPALA-12709:
--

[~VenuReddy]thank for your reply ,looking  forward to your update.

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-01-24 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810730#comment-17810730
 ] 

Maxwell Guo commented on IMPALA-12709:
--

Hi [~VenuReddy], are there any plan on this patch ? such as the release 
timeline . 

If this patch is going to split into some small task , and I think I can do 
some help with some of the tasks.

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] (IMPALA-12709) Hierarchical metastore event processing

2024-01-15 Thread Maxwell Guo (Jira)


[ https://issues.apache.org/jira/browse/IMPALA-12709 ]


Maxwell Guo deleted comment on IMPALA-12709:
--

was (Author: maxwellguo):
I may have a different point of view. Is it possible to divide the db into 
buckets according to the original operation time and parallelize each bucket? 
Each time, 1000 events are taken from HMS, divided into buckets, and then 
processed in parallel. After all events are processed, the next batch is 
processed.

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing

2024-01-14 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806610#comment-17806610
 ] 

Maxwell Guo edited comment on IMPALA-12709 at 1/15/24 3:52 AM:
---

I may have a different point of view. Is it possible to divide the db into 
buckets according to the original operation time and parallelize each bucket? 
Each time, 1000 events are taken from HMS, divided into buckets, and then 
processed in parallel. After all events are processed, the next batch is 
processed.


was (Author: maxwellguo):
I may have a different point of view. Is it possible to divide the db into 
buckets according to the original operation time and parallelize each bucket?

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-01-14 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806610#comment-17806610
 ] 

Maxwell Guo commented on IMPALA-12709:
--

I may have a different point of view. Is it possible to divide the db into 
buckets according to the original operation time and parallelize each bucket?

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12662) Support whitelist for db and table

2023-12-21 Thread Maxwell Guo (Jira)
Maxwell Guo created IMPALA-12662:


 Summary: Support whitelist for db and table
 Key: IMPALA-12662
 URL: https://issues.apache.org/jira/browse/IMPALA-12662
 Project: IMPALA
  Issue Type: Improvement
  Components: be, fe
Reporter: Maxwell Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12468) Add the ability to update EventProcessorStatus

2023-12-05 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12468 started by Maxwell Guo.

> Add the ability to update EventProcessorStatus
> --
>
> Key: IMPALA-12468
> URL: https://issues.apache.org/jira/browse/IMPALA-12468
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be, Catalog, fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> Once the impala and hive's status is missmatched , and the 
> EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate 
> metadata to reset the catalog instance. And then impala will update the 
> status to ACTIVE . 
> But if impala contains many tables , the cost of invalidate is a bit high for 
> a global invalidate. So we may invalidate metadata for tables one by one for 
> these incremental changed table. For example , we have 1000,000,000,000 
> tables but only some of the table event process occurs CatalogException and 
> MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
> need to invalidate all table caches in order to reset the catalog instance 
> see [here 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
>  
> MetaStoresProcessor 's async update process will not update the currentStatus 
> when the status is not ACTIVE, see 
> [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]
> So what about add a new SQL grammar : SET EVENT STATUS ${status}  ?  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12506) Add the ability to update EventProcessorStatus through webUi

2023-12-05 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12506 started by Maxwell Guo.

> Add the ability to update EventProcessorStatus through webUi
> 
>
> Key: IMPALA-12506
> URL: https://issues.apache.org/jira/browse/IMPALA-12506
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description in 
> [Ml|https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12573) Give configuration load_catalog_in_background more fine-grained configuration

2023-11-22 Thread Maxwell Guo (Jira)
Maxwell Guo created IMPALA-12573:


 Summary: Give configuration load_catalog_in_background more 
fine-grained configuration
 Key: IMPALA-12573
 URL: https://issues.apache.org/jira/browse/IMPALA-12573
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Maxwell Guo
Assignee: Maxwell Guo


As we know if  load_catalog_in_background set to true, then the table meta will 
load async for catalogd.
During this period when catalogd starts up, if the flag set to true, then all 
the table will load async, then the queue will be big . So we may left it to 
false by deafult. But if we invalidate some table manually ,we may want them to 
load . So I think we can introduce a new flag 
load_catalog_in_background_at_startup , we can set 
load_catalog_in_background_at_startup to false, and load_catalog_in_background 
to true by default. 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-10-23 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo closed IMPALA-12402.

Resolution: Fixed

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-10-23 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778936#comment-17778936
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Thank you so much . [~stigahuang]

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-10-23 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778927#comment-17778927
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Hi [~MikaelSmith][~stigahuang] can you help to take a look at this build ? It 
seems some test is failed agagin. After looking at it, I don’t have any clues 
about how to solve this error. :(

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-10-23 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778596#comment-17778596
 ] 

Maxwell Guo commented on IMPALA-12402:
--

update agagin.

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12506) Add the ability to update EventProcessorStatus through webUi

2023-10-18 Thread Maxwell Guo (Jira)
Maxwell Guo created IMPALA-12506:


 Summary: Add the ability to update EventProcessorStatus through 
webUi
 Key: IMPALA-12506
 URL: https://issues.apache.org/jira/browse/IMPALA-12506
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Maxwell Guo
Assignee: Maxwell Guo


See the description in 
[Ml|https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12468) Add the ability to update EventProcessorStatus

2023-10-18 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12468:
-
Description: 
Once the impala and hive's status is missmatched , and the EventProcessorStatus 
become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog 
instance. And then impala will update the status to ACTIVE . 

But if impala contains many tables , the cost of invalidate is a bit high for a 
global invalidate. So we may invalidate metadata for tables one by one for 
these incremental changed table. For example , we have 1000,000,000,000 tables 
but only some of the table event process occurs CatalogException and 
MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
need to invalidate all table caches in order to reset the catalog instance see 
[here 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
 

MetaStoresProcessor 's async update process will not update the currentStatus 
when the status is not ACTIVE, see 
[here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]

So what about add a new SQL grammar : SET EVENT STATUS ${status}  ?  


  was:
Once the impala and hive's status is missmatched , and the EventProcessorStatus 
become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog 
instance. And then impala will update the status to ACTIVE . 

But if impala contains many tables , the cost of invalidate is a bit high for a 
global invalidate. So we may invalidate metadata for tables one by one for 
these incremental changed table. For example , we have 1000,000,000,000 tables 
but only some of the table event process occurs CatalogException and 
MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
need to invalidate all table caches in order to reset the catalog instance see 
[here 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
 

MetaStoresProcessor 's async update process will not update the currentStatus 
when the status is not ACTIVE, see 
[here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]

So what about add a new SQL grammar : RESET STATUS status  ?  



> Add the ability to update EventProcessorStatus
> --
>
> Key: IMPALA-12468
> URL: https://issues.apache.org/jira/browse/IMPALA-12468
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be, Catalog, fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> Once the impala and hive's status is missmatched , and the 
> EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate 
> metadata to reset the catalog instance. And then impala will update the 
> status to ACTIVE . 
> But if impala contains many tables , the cost of invalidate is a bit high for 
> a global invalidate. So we may invalidate metadata for tables one by one for 
> these incremental changed table. For example , we have 1000,000,000,000 
> tables but only some of the table event process occurs CatalogException and 
> MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
> need to invalidate all table caches in order to reset the catalog instance 
> see [here 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
>  
> MetaStoresProcessor 's async update process will not update the currentStatus 
> when the status is not ACTIVE, see 
> [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]
> So what about add a new SQL grammar : SET EVENT STATUS ${status}  ?  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12468) Add the ability to update EventProcessorStatus

2023-09-26 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12468:
-
Description: 
Once the impala and hive's status is missmatched , and the EventProcessorStatus 
become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog 
instance. And then impala will update the status to ACTIVE . 

But if impala contains many tables , the cost of invalidate is a bit high for a 
global invalidate. So we may invalidate metadata for tables one by one for 
these incremental changed table. For example , we have 1000,000,000,000 tables 
but only some of the table event process occurs CatalogException and 
MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
need to invalidate all table caches in order to reset the catalog instance see 
[here 
|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
 

MetaStoresProcessor 's async update process will not update the currentStatus 
when the status is not ACTIVE, see 
[here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]

So what about add a new SQL grammar : RESET STATUS status  ?  


  was:
Once the impala and hive's status is missmatched , and the EventProcessorStatus 
become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog 
instance. And then impala will update the status to ACTIVE . 
But if impala contains many tables , the cost of invalidate is a bit high for a 
global invalidate. So we may invalidate metadata for tables one by one for 
these incremental changed table. For example , we have 1000,000,000,000 tables 
but only some of the table event process occurs CatalogException and 
MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
need to invalidate all table caches. 


> Add the ability to update EventProcessorStatus
> --
>
> Key: IMPALA-12468
> URL: https://issues.apache.org/jira/browse/IMPALA-12468
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be, Catalog, fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> Once the impala and hive's status is missmatched , and the 
> EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate 
> metadata to reset the catalog instance. And then impala will update the 
> status to ACTIVE . 
> But if impala contains many tables , the cost of invalidate is a bit high for 
> a global invalidate. So we may invalidate metadata for tables one by one for 
> these incremental changed table. For example , we have 1000,000,000,000 
> tables but only some of the table event process occurs CatalogException and 
> MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
> need to invalidate all table caches in order to reset the catalog instance 
> see [here 
> |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088].
>  
> MetaStoresProcessor 's async update process will not update the currentStatus 
> when the status is not ACTIVE, see 
> [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876]
> So what about add a new SQL grammar : RESET STATUS status  ?  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12468) Add the ability to update EventProcessorStatus

2023-09-26 Thread Maxwell Guo (Jira)
Maxwell Guo created IMPALA-12468:


 Summary: Add the ability to update EventProcessorStatus
 Key: IMPALA-12468
 URL: https://issues.apache.org/jira/browse/IMPALA-12468
 Project: IMPALA
  Issue Type: Improvement
  Components: be, Catalog, fe
Reporter: Maxwell Guo
Assignee: Maxwell Guo


Once the impala and hive's status is missmatched , and the EventProcessorStatus 
become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog 
instance. And then impala will update the status to ACTIVE . 
But if impala contains many tables , the cost of invalidate is a bit high for a 
global invalidate. So we may invalidate metadata for tables one by one for 
these incremental changed table. For example , we have 1000,000,000,000 tables 
but only some of the table event process occurs CatalogException and 
MetastoreNotificationNeedsInvalidateException was thrown. I think there is no 
need to invalidate all table caches. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-09-19 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766811#comment-17766811
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Seems final test failed with py test ~~~

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-09-18 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766615#comment-17766615
 ] 

Maxwell Guo commented on IMPALA-12402:
--

so Now the status of this jira should be "needs committer"

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-09-17 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766175#comment-17766175
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Hello, I have saw that [~MikaelSmith] have +1 on this patch , so do we need 
another committer to +1 on this  before this patch can be checked in ? 
[~stigahuang]

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-09-14 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765415#comment-17765415
 ] 

Maxwell Guo commented on IMPALA-12402:
--

I have already modified the commit message and together with the merge  
conflict . [~MikaelSmith]

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable

2023-09-12 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Summary: Make CatalogdMetaProvider's cache concurrency level configurable  
(was: Add some configurations for CatalogdMetaProvider's cache_)

> Make CatalogdMetaProvider's cache concurrency level configurable
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-11 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763986#comment-17763986
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Sorry , this is my first time to use gerrit to push code. I have use the same 
Change-Id agagin. [~MikaelSmith] Thanks for reminding

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-11 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763589#comment-17763589
 ] 

Maxwell Guo commented on IMPALA-12402:
--

[~MikaelSmith]Thank you for your reiview, I have update the code agagin.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-07 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762954#comment-17762954
 ] 

Maxwell Guo commented on IMPALA-12402:
--

[~MikaelSmith]  thanks for your reply,I think it is better to make this param 
of guava cache's concurrencyLevel (also I may want to make more than this one 
param)  configurable instand of the default value 4.
for many tables I think the value should be more than 4 like 128 or 256. When 
we saw the jstack for impala at startup stage, we found the threads are all 
waitting for the lock. see 
https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L432
lower value will lead to thread contention . 
As in this cache ,the concurrency level can be use as the buckect number . So 
more buckect little  thread contention I think(We assume that the values ​​are 
random enough).

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-06 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762550#comment-17762550
 ] 

Maxwell Guo commented on IMPALA-12402:
--

[~stigahuang][~tmate] can you help to take a look at this little patch ?

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-05 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Attachment: (was: 
0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch)

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-04 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761917#comment-17761917
 ] 

Maxwell Guo edited comment on IMPALA-12402 at 9/5/23 1:43 AM:
--

Hello , can anyone help to review this little patch ? 
and another question is how can I run the pre-commit tests ?


was (Author: maxwellguo):
Hello , can anyone help to review this little patch ?

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-04 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761917#comment-17761917
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Hello , can anyone help to review this little patch ?

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-31 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760202#comment-17760202
 ] 

Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 7:23 AM:
---

[gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review 
agagin now.

[Build passed |https://jenkins.impala.io/job/gerrit-code-review-checks/13892/] 


was (Author: maxwellguo):
[gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review 
agagin now.

[Build passed https://jenkins.impala.io/job/gerrit-code-review-checks/13892/] 

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-31 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760202#comment-17760202
 ] 

Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 7:23 AM:
---

[gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review 
agagin now.

[Build passed https://jenkins.impala.io/job/gerrit-code-review-checks/13892/] 


was (Author: maxwellguo):
[gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review 
agagin now.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-31 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760202#comment-17760202
 ] 

Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 7:03 AM:
---

[gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review 
agagin now.


was (Author: maxwellguo):
[gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review 
agagin now.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-31 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760202#comment-17760202
 ] 

Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 6:46 AM:
---

[gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review 
agagin now.


was (Author: maxwellguo):
[gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review 
agagin now.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760202#comment-17760202
 ] 

Maxwell Guo edited comment on IMPALA-12402 at 8/30/23 2:30 AM:
---

[gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review 
agagin now.


was (Author: maxwellguo):
[gerrit link|https://gerrit.cloudera.org/#/c/20435/]

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Language: java C++  (was: java)

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-

[gerrit link|https://gerrit.cloudera.org/#/c/20435/]

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760189#comment-17760189
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Thanks [~stigahuang]

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12402 started by Maxwell Guo.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Attachment: (was: 
0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch)

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Attachment: 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-29 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Attachment: 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 
> 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch
>
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758821#comment-17758821
 ] 

Maxwell Guo commented on IMPALA-12402:
--

How can I assign this jira to myself ? I can't find the button.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
 Flags: Patch
Labels: pull-request-available  (was: )

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
Summary: Add some configurations for CatalogdMetaProvider's cache_  (was: 
Add some configurations for CatalogMetaProvider's cache_)

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxwell Guo updated IMPALA-12402:
-
  Language: java
Target Version: Impala 4.2.0

> Add some configurations for CatalogMetaProvider's cache_
> 
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Priority: Minor
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12402) Add some configurations for CatalogMetaProvider's cache_

2023-08-24 Thread Maxwell Guo (Jira)
Maxwell Guo created IMPALA-12402:


 Summary: Add some configurations for CatalogMetaProvider's cache_
 Key: IMPALA-12402
 URL: https://issues.apache.org/jira/browse/IMPALA-12402
 Project: IMPALA
  Issue Type: Improvement
  Components: fe
Reporter: Maxwell Guo


when the cluster contains many db and tables such as if there are more than 
10 tables, and if we restart the impalad , the local cache_ 
CatalogMetaProvider's need to doing some loading process. 
As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
default. 
but if there is many tables the loading process will need more time and 
increase the probability of lock contention, see 
[here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
 
So we propose to add some configurations here, the first is the concurrency of 
cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2761) Build and Run Impala on OS X

2023-08-22 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757304#comment-17757304
 ] 

Maxwell Guo commented on IMPALA-2761:
-

any update here ? Besides on macos for intel, I think apple m1/m2 is needed too.

> Build and Run Impala on OS X
> 
>
> Key: IMPALA-2761
> URL: https://issues.apache.org/jira/browse/IMPALA-2761
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Martin Grund
>Priority: Minor
>  Labels: osx
>
> This is an Umbrella Ticket to support building an running Impala on Mac OS X. 
> Comments will be used to keep track of the status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org