[jira] [Updated] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12771: - Fix Version/s: Impala 4.5.0 (was: Impala 4.4.2) > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Fix For: Impala 4.5.0 > > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12771: - Fix Version/s: Impala 4.4.2 > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Fix For: Impala 4.4.2 > > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo resolved IMPALA-12771. -- Resolution: Fixed > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869531#comment-17869531 ] Maxwell Guo commented on IMPALA-12771: -- Is this patch ready-to-commit since [~stigahuang] +1 on this patch ? [~mylogi...@gmail.com] > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13255) Support new SQL grammar for RESTART EVENTPROCESSOR [FROM EVENT id]
[ https://issues.apache.org/jira/browse/IMPALA-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-13255: - Description: see [discussion |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m], in this jira, I am going to add new grammar in this format : RESTART EVENTPROCESSOR [FROM EVENT id][IN MODE] this grammar is only used to set the status of impala catalog event processor, this can help to set the processor to restart to process event from a specified event id , and if id is not specified , then we will restart from the very beginning like with the event id of -1. as for the IN MODE keywords, we support : INVALIDATE MODE and RELOAD MODE as [discussion |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m] described. was: see [discussion |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m], in this jira, I am going to add new grammar in this format : RESTART EVENTPROCESSOR [FROM EVENT id] this grammar is only used to set the status of impala catalog event processor, this can help to set the processor to restart to process event from a specified event id , and if id is not specified , then we will restart from the very beginning like with the event id of -1. > Support new SQL grammar for RESTART EVENTPROCESSOR [FROM EVENT id] > --- > > Key: IMPALA-13255 > URL: https://issues.apache.org/jira/browse/IMPALA-13255 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Clients, Frontend >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > see [discussion > |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m], in this > jira, I am going to add new grammar in this format : > RESTART EVENTPROCESSOR [FROM EVENT id][IN MODE] > this grammar is only used to set the status of impala catalog event > processor, this can help to set the processor to restart to process event > from a specified event id , and if id is not specified , then we will restart > from the very beginning like with the event id of -1. > as for the IN MODE keywords, we support : INVALIDATE MODE and RELOAD MODE as > [discussion > |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m] described. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13255) Support new SQL grammar for RESTART EVENTPROCESSOR [FROM EVENT id]
[ https://issues.apache.org/jira/browse/IMPALA-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-13255: - Component/s: Catalog Clients Frontend Epic Link: IMPALA-11531 > Support new SQL grammar for RESTART EVENTPROCESSOR [FROM EVENT id] > --- > > Key: IMPALA-13255 > URL: https://issues.apache.org/jira/browse/IMPALA-13255 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Clients, Frontend >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > see [discussion > |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m], in this > jira, I am going to add new grammar in this format : > RESTART EVENTPROCESSOR [FROM EVENT id] > this grammar is only used to set the status of impala catalog event > processor, this can help to set the processor to restart to process event > from a specified event id , and if id is not specified , then we will restart > from the very beginning like with the event id of -1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13255) Support new SQL grammar for RESTART EVENTPROCESSOR [FROM EVENT id]
[ https://issues.apache.org/jira/browse/IMPALA-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-13255: - Summary: Support new SQL grammar for RESTART EVENTPROCESSOR [FROM EVENT id] (was: Support new SQL grammar for RESTART EVENTPROCESSOR for event) > Support new SQL grammar for RESTART EVENTPROCESSOR [FROM EVENT id] > --- > > Key: IMPALA-13255 > URL: https://issues.apache.org/jira/browse/IMPALA-13255 > Project: IMPALA > Issue Type: Improvement >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > see [discussion > |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m], in this > jira, I am going to add new grammar in this format : > RESTART EVENTPROCESSOR [FROM EVENT id] > this grammar is only used to set the status of impala catalog event > processor, this can help to set the processor to restart to process event > from a specified event id , and if id is not specified , then we will restart > from the very beginning like with the event id of -1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13255) Support new SQL grammar for RESTART EVENTPROCESSOR for event
Maxwell Guo created IMPALA-13255: Summary: Support new SQL grammar for RESTART EVENTPROCESSOR for event Key: IMPALA-13255 URL: https://issues.apache.org/jira/browse/IMPALA-13255 Project: IMPALA Issue Type: Improvement Reporter: Maxwell Guo Assignee: Maxwell Guo see [discussion |https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m], in this jira, I am going to add new grammar in this format : RESTART EVENTPROCESSOR [FROM EVENT id] this grammar is only used to set the status of impala catalog event processor, this can help to set the processor to restart to process event from a specified event id , and if id is not specified , then we will restart from the very beginning like with the event id of -1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866044#comment-17866044 ] Maxwell Guo commented on IMPALA-12468: -- Thanks [~stigahuang] > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : SET EVENT STATUS ${status} ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865855#comment-17865855 ] Maxwell Guo commented on IMPALA-12468: -- [~VenuReddy] [~stigahuang]any update on this issue ? If you agree with my suggestion, then I may start this work next. > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : SET EVENT STATUS ${status} ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859856#comment-17859856 ] Maxwell Guo commented on IMPALA-12771: -- ping [~stigahuang][~mylogi...@gmail.com] > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850908#comment-17850908 ] Maxwell Guo commented on IMPALA-12771: -- ping again , and update the pr for the latest master branch code in case of merge conflict. [~mylogi...@gmail.com][~stigahuang][~VenuReddy] > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835458#comment-17835458 ] Maxwell Guo commented on IMPALA-12771: -- ping [~mylogi...@gmail.com][~stigahuang][~VenuReddy]:D > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835459#comment-17835459 ] Maxwell Guo commented on IMPALA-12709: -- [~VenuReddy] any update here ?:D > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833178#comment-17833178 ] Maxwell Guo commented on IMPALA-12709: -- [~VenuReddy] Thanks very much , looking forward to your update. > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833010#comment-17833010 ] Maxwell Guo commented on IMPALA-12771: -- hi [~hemanth619], thanks for your review comments, I have responded to your comments and updated the latest code. Looking forward to your reply. :) > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831139#comment-17831139 ] Maxwell Guo commented on IMPALA-12709: -- [~VenuReddy] Sorry to bother you again , is there any update on this ? :) > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12912) Show history of event processing in the /events page
[ https://issues.apache.org/jira/browse/IMPALA-12912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo reassigned IMPALA-12912: Assignee: Maxwell Guo > Show history of event processing in the /events page > > > Key: IMPALA-12912 > URL: https://issues.apache.org/jira/browse/IMPALA-12912 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Maxwell Guo >Priority: Major > > This is a follow-up task of IMPALA-12782 where we add some basic info in the > /events page. It'd be helpful to also show the history of event processing, > including the top-10 expensive events/tables, the recent 10 failure messages, > etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12912) Show history of event processing in the /events page
[ https://issues.apache.org/jira/browse/IMPALA-12912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829730#comment-17829730 ] Maxwell Guo commented on IMPALA-12912: -- [~stigahuang] Can I assign this issue to me ? > Show history of event processing in the /events page > > > Key: IMPALA-12912 > URL: https://issues.apache.org/jira/browse/IMPALA-12912 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Priority: Major > > This is a follow-up task of IMPALA-12782 where we add some basic info in the > /events page. It'd be helpful to also show the history of event processing, > including the top-10 expensive events/tables, the recent 10 failure messages, > etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828156#comment-17828156 ] Maxwell Guo commented on IMPALA-12771: -- ping [~stigahuang][~mylogi...@gmail.com] [~VenuReddy]:D > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826933#comment-17826933 ] Maxwell Guo commented on IMPALA-12771: -- [~stigahuang][~mylogi...@gmail.com][~VenuReddy] Hi can you help to take a look again ? > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825579#comment-17825579 ] Maxwell Guo commented on IMPALA-12468: -- So if there is any need for me to continue working on this jira ? > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : SET EVENT STATUS ${status} ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825575#comment-17825575 ] Maxwell Guo commented on IMPALA-12709: -- Thanks [~VenuReddy] > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825571#comment-17825571 ] Maxwell Guo commented on IMPALA-12468: -- The reason why I create the jira is that we meet the problem of IMPALA-12832 and we have made some code modification just like IMPALA-12832 but we only dealt with the situation of NEEDS_INVALIDATE。 After looking at the code of impala, we found that the ERROR status may be different NEED_INVALIDATE. For example, we have found that impala's schema is not compatible with hive (but this have been fixed in the new impala version), it may be necessary to ask the HMS side to do some table operations, so that the EP can deal with this table correctly . Therefore, in this case, I thought of manually resetting the EP status after user modifying their schema. > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : SET EVENT STATUS ${status} ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825340#comment-17825340 ] Maxwell Guo commented on IMPALA-12709: -- Is there any update on this issue ? [~VenuReddy][~rizaon][~stigahuang] > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823015#comment-17823015 ] Maxwell Guo commented on IMPALA-12468: -- Hi [~mylogi...@gmail.com] , Yeah , what you said is another solution for this problem, but I want to make this feature more general, as there are other event statuses for EP , not only NEEDD_INVALIDATE. What I want to do is let this state have room for manual adjustment, you know that any status that is not ACTIVE will stop the process of EP, and then the only way we can do is restart or invalidate all for catalogd. > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : SET EVENT STATUS ${status} ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820535#comment-17820535 ] Maxwell Guo commented on IMPALA-12771: -- The build failed, let me take a look at this case. > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12573) Give configuration load_catalog_in_background more fine-grained configuration
[ https://issues.apache.org/jira/browse/IMPALA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819452#comment-17819452 ] Maxwell Guo commented on IMPALA-12573: -- [~stigahuang]Thanks for your reply. I think load_dbs_in_background and load_tables_in_background may help. But maybe I didn't understand your expression clearly, does this two configurations should be string that just like table/db black list , if some one want some tables be always loaded. These configurations are not going to be boolean flags , am I right ? > Give configuration load_catalog_in_background more fine-grained configuration > - > > Key: IMPALA-12573 > URL: https://issues.apache.org/jira/browse/IMPALA-12573 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > As we know if load_catalog_in_background set to true, then the table meta > will load async for catalogd. > During this period when catalogd starts up, if the flag set to true, then all > the table will load async, then the queue will be big . So we may left it to > false by deafult. But if we invalidate some table manually ,we may want them > to load . So I think we can introduce a new flag > load_catalog_in_background_at_startup , we can set > load_catalog_in_background_at_startup to false, and > load_catalog_in_background to true by default. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819210#comment-17819210 ] Maxwell Guo commented on IMPALA-12771: -- done , publish it now > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819161#comment-17819161 ] Maxwell Guo edited comment on IMPALA-12771 at 2/21/24 9:48 AM: --- The initial version is here https://gerrit.cloudera.org/#/c/21045/ and I am doing local testing at the same time. CC [~mylogi...@gmail.com][~stigahuang][~VenuReddy] ,let me know if there is something obviously wrong with my modifications. was (Author: maxwellguo): The initial version is here https://gerrit.cloudera.org/#/c/21045/ and I am doing local testing at the same time. CC [~mylogi...@gmail.com][~stigahuang][~VenuReddy] ,let me know if there is something obviously wrong with my modifications. > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819161#comment-17819161 ] Maxwell Guo commented on IMPALA-12771: -- The initial version is here https://gerrit.cloudera.org/#/c/21045/ and I am doing local testing at the same time. CC [~mylogi...@gmail.com][~stigahuang][~VenuReddy] ,let me know if there is something obviously wrong with my modifications. > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818229#comment-17818229 ] Maxwell Guo edited comment on IMPALA-12709 at 2/19/24 11:24 AM: Hi [~VenuReddy] ,After reading the code ,I only found [EventsProcessorStressTest title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java] which may has some relations with performance, But I think some function customization is required if we want to use the code. [~stigahuang] [~mylogi...@gmail.com] any more suggestions? Besides, What about make this patch configurable, one of the benefits is that you can visually see the comparison results through configuration without changing this code, and I think new features are generally turned off by default. was (Author: maxwellguo): Hi [~VenuReddy] ,After reading the code ,I only found [EventsProcessorStressTest title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java] which may has some relations with performance, But I think some function customization is required if we want to the code. [~stigahuang] [~mylogi...@gmail.com] any more suggestions? Besides, What about make this patch configurable, one of the benefits is that you can visually see the comparison results through configuration without changing this code, and I think new features are generally turned off by default. > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818229#comment-17818229 ] Maxwell Guo edited comment on IMPALA-12709 at 2/19/24 11:24 AM: Hi [~VenuReddy] ,After reading the code ,I only found [EventsProcessorStressTest title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java] which may has some relations with performance, But I think some function customization is required if we want to the code. [~stigahuang] [~mylogi...@gmail.com] any more suggestions? Besides, What about make this patch configurable, one of the benefits is that you can visually see the comparison results through configuration without changing this code, and I think new features are generally turned off by default. was (Author: maxwellguo): Hi [~VenuReddy] ,After reading the code ,I only found [EventsProcessorStressTest title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java] which may has some relations with performance, But I think some function customization is required if we want to you the code. [~stigahuang] [~mylogi...@gmail.com] any more suggestions? Besides, What about make this patch configurable, one of the benefits is that you can visually see the comparison results through configuration without changing this code, and I think new features are generally turned off by default. > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12771: - Description: See the description of [event-skipped metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] {code:java} // total number of events which are skipped because of the flag setting or // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were ignored // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in the catalogd. {code} As for CREATE and DROP event on Database/Table/Partition (Also AddPartition is inclued) when we found that the table/database when the database or table is not found in the cache then we will skip the event process and make the event-skipped metric +1. But I found that there is some question here for alter table and Reload event: * For Reload event that is not describe in the description of events-skipped, but the value is +1 when is oldevent; * Besides if the table is in blacklist the metric will also +1 In summary, I think this description is inconsistent with the actual implementation. So can we also mark the events-skipped metric for alter partition events and modify the description to be all the events skipped was: See the description of [event-skipped metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] {code:java} // total number of events which are skipped because of the flag setting or // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were ignored // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in the catalogd. {code} As for CREATE and DROP event on Database/Table/Partition (Also AddPartition is inclued) when we found that the table/database when the database or table is not found in the cache then we will skip the event process and make the event-skipped metric +1. But I found that there is some question here for alter table and Reload event: * For alter table if renaming a table , the events-skipped metric will also +1 ,see [oldTblRemoved to be false |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] * For Reload event that is not describe in the description of events-skipped, but the value is +1 when is oldevent; * Besides if the table is in blacklist the metric will also +1 In summary, I think this description is inconsistent with the actual implementation. So can we also mark the events-skipped metric for alter partition events and modify the description to be all the events skipped > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12771 started by Maxwell Guo. > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For alter table if renaming a table , the events-skipped metric will also > +1 ,see [oldTblRemoved to be false > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818229#comment-17818229 ] Maxwell Guo commented on IMPALA-12709: -- Hi [~VenuReddy] ,After reading the code ,I only found [EventsProcessorStressTest title|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java] which may has some relations with performance, But I think some function customization is required if we want to you the code. [~stigahuang] [~mylogi...@gmail.com] any more suggestions? Besides, What about make this patch configurable, one of the benefits is that you can visually see the comparison results through configuration without changing this code, and I think new features are generally turned off by default. > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812553#comment-17812553 ] Maxwell Guo commented on IMPALA-12771: -- Thanks for reminding me, I think I have used it before > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For alter table if renaming a table , the events-skipped metric will also > +1 ,see [oldTblRemoved to be false > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812236#comment-17812236 ] Maxwell Guo commented on IMPALA-12771: -- Besides, I found an interesting piece of code, [ here |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1023] as the TableLoadingException and DatabaseNotFoundException is catched in the method [here |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1015] and the inner function of [reloadTableIfExists|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2902] has already catched the exception and the function do not re-throw the exception out so , the outside function has no need to deal with these two exception in my mind. And I think it is not suitable to print an info level log [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2903]. Warn level logging is better. > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For alter table if renaming a table , the events-skipped metric will also > +1 ,see [oldTblRemoved to be false > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812237#comment-17812237 ] Maxwell Guo commented on IMPALA-12771: -- If you think my suggestion is reasonable, I will submit a PR later > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For alter table if renaming a table , the events-skipped metric will also > +1 ,see [oldTblRemoved to be false > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812215#comment-17812215 ] Maxwell Guo commented on IMPALA-12771: -- ping [~huangqiang] > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For alter table if renaming a table , the events-skipped metric will also > +1 ,see [oldTblRemoved to be false > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812214#comment-17812214 ] Maxwell Guo commented on IMPALA-12771: -- with alter parition event, we may found that database/table is not found , or table is not loaded see [here |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1057]. then we just skipped the and made a debug log , the log is : " Ignoring the event since the table is not found ". But actually, we skipped the handling of these events when table is not found .So I think we can also +1 on the events-skipped metric if table is not found or table is IncompleteTable or table was remove in catalog. Besides, we just mark the events-skipped metric for event process with isOlderEvent method and isSelfEvent see [isSelfEvent|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L857] and [isOlderEvent|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1198] But as for [canBeSkipped|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1703] method, the metric is not +1. I think we can also add here. > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For alter table if renaming a table , the events-skipped metric will also > +1 ,see [oldTblRemoved to be false > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
Maxwell Guo created IMPALA-12771: Summary: Impala catalogd events-skipped may mark the wrong number Key: IMPALA-12771 URL: https://issues.apache.org/jira/browse/IMPALA-12771 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Maxwell Guo Assignee: Maxwell Guo See the description of [event-skipped metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] {code:java} // total number of events which are skipped because of the flag setting or // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were ignored // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in the catalogd. {code} As for CREATE and DROP event on Database/Table/Partition (Also AddPartition is inclued) when we found that the table/database when the database or table is not found in the cache then we will skip the event process and make the event-skipped metric +1. But I found that there is some question here for alter table and Reload event: * For alter table if renaming a table , the events-skipped metric will also +1 ,see [oldTblRemoved to be false |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1653] * For Reload event that is not describe in the description of events-skipped, but the value is +1 when is oldevent; * Besides if the table is in blacklist the metric will also +1 In summary, I think this description is inconsistent with the actual implementation. So can we also mark the events-skipped metric for alter partition events and modify the description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810960#comment-17810960 ] Maxwell Guo commented on IMPALA-12709: -- [~VenuReddy]thank for your reply ,looking forward to your update. > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810730#comment-17810730 ] Maxwell Guo commented on IMPALA-12709: -- Hi [~VenuReddy], are there any plan on this patch ? such as the release timeline . If this patch is going to split into some small task , and I think I can do some help with some of the tasks. > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709 ] Maxwell Guo deleted comment on IMPALA-12709: -- was (Author: maxwellguo): I may have a different point of view. Is it possible to divide the db into buckets according to the original operation time and parallelize each bucket? Each time, 1000 events are taken from HMS, divided into buckets, and then processed in parallel. After all events are processed, the next batch is processed. > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806610#comment-17806610 ] Maxwell Guo edited comment on IMPALA-12709 at 1/15/24 3:52 AM: --- I may have a different point of view. Is it possible to divide the db into buckets according to the original operation time and parallelize each bucket? Each time, 1000 events are taken from HMS, divided into buckets, and then processed in parallel. After all events are processed, the next batch is processed. was (Author: maxwellguo): I may have a different point of view. Is it possible to divide the db into buckets according to the original operation time and parallelize each bucket? > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806610#comment-17806610 ] Maxwell Guo commented on IMPALA-12709: -- I may have a different point of view. Is it possible to divide the db into buckets according to the original operation time and parallelize each bucket? > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12662) Support whitelist for db and table
Maxwell Guo created IMPALA-12662: Summary: Support whitelist for db and table Key: IMPALA-12662 URL: https://issues.apache.org/jira/browse/IMPALA-12662 Project: IMPALA Issue Type: Improvement Components: be, fe Reporter: Maxwell Guo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12468 started by Maxwell Guo. > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : SET EVENT STATUS ${status} ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12506) Add the ability to update EventProcessorStatus through webUi
[ https://issues.apache.org/jira/browse/IMPALA-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12506 started by Maxwell Guo. > Add the ability to update EventProcessorStatus through webUi > > > Key: IMPALA-12506 > URL: https://issues.apache.org/jira/browse/IMPALA-12506 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description in > [Ml|https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12573) Give configuration load_catalog_in_background more fine-grained configuration
Maxwell Guo created IMPALA-12573: Summary: Give configuration load_catalog_in_background more fine-grained configuration Key: IMPALA-12573 URL: https://issues.apache.org/jira/browse/IMPALA-12573 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Maxwell Guo Assignee: Maxwell Guo As we know if load_catalog_in_background set to true, then the table meta will load async for catalogd. During this period when catalogd starts up, if the flag set to true, then all the table will load async, then the queue will be big . So we may left it to false by deafult. But if we invalidate some table manually ,we may want them to load . So I think we can introduce a new flag load_catalog_in_background_at_startup , we can set load_catalog_in_background_at_startup to false, and load_catalog_in_background to true by default. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo closed IMPALA-12402. Resolution: Fixed > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778936#comment-17778936 ] Maxwell Guo commented on IMPALA-12402: -- Thank you so much . [~stigahuang] > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778927#comment-17778927 ] Maxwell Guo commented on IMPALA-12402: -- Hi [~MikaelSmith][~stigahuang] can you help to take a look at this build ? It seems some test is failed agagin. After looking at it, I don’t have any clues about how to solve this error. :( > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778596#comment-17778596 ] Maxwell Guo commented on IMPALA-12402: -- update agagin. > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12506) Add the ability to update EventProcessorStatus through webUi
Maxwell Guo created IMPALA-12506: Summary: Add the ability to update EventProcessorStatus through webUi Key: IMPALA-12506 URL: https://issues.apache.org/jira/browse/IMPALA-12506 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Maxwell Guo Assignee: Maxwell Guo See the description in [Ml|https://lists.apache.org/thread/5mw8jd5hgz7yycz9h2pxvqj101k0j47m] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12468: - Description: Once the impala and hive's status is missmatched , and the EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog instance. And then impala will update the status to ACTIVE . But if impala contains many tables , the cost of invalidate is a bit high for a global invalidate. So we may invalidate metadata for tables one by one for these incremental changed table. For example , we have 1000,000,000,000 tables but only some of the table event process occurs CatalogException and MetastoreNotificationNeedsInvalidateException was thrown. I think there is no need to invalidate all table caches in order to reset the catalog instance see [here |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. MetaStoresProcessor 's async update process will not update the currentStatus when the status is not ACTIVE, see [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] So what about add a new SQL grammar : SET EVENT STATUS ${status} ? was: Once the impala and hive's status is missmatched , and the EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog instance. And then impala will update the status to ACTIVE . But if impala contains many tables , the cost of invalidate is a bit high for a global invalidate. So we may invalidate metadata for tables one by one for these incremental changed table. For example , we have 1000,000,000,000 tables but only some of the table event process occurs CatalogException and MetastoreNotificationNeedsInvalidateException was thrown. I think there is no need to invalidate all table caches in order to reset the catalog instance see [here |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. MetaStoresProcessor 's async update process will not update the currentStatus when the status is not ACTIVE, see [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] So what about add a new SQL grammar : RESET STATUS status ? > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : SET EVENT STATUS ${status} ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12468) Add the ability to update EventProcessorStatus
[ https://issues.apache.org/jira/browse/IMPALA-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12468: - Description: Once the impala and hive's status is missmatched , and the EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog instance. And then impala will update the status to ACTIVE . But if impala contains many tables , the cost of invalidate is a bit high for a global invalidate. So we may invalidate metadata for tables one by one for these incremental changed table. For example , we have 1000,000,000,000 tables but only some of the table event process occurs CatalogException and MetastoreNotificationNeedsInvalidateException was thrown. I think there is no need to invalidate all table caches in order to reset the catalog instance see [here |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. MetaStoresProcessor 's async update process will not update the currentStatus when the status is not ACTIVE, see [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] So what about add a new SQL grammar : RESET STATUS status ? was: Once the impala and hive's status is missmatched , and the EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog instance. And then impala will update the status to ACTIVE . But if impala contains many tables , the cost of invalidate is a bit high for a global invalidate. So we may invalidate metadata for tables one by one for these incremental changed table. For example , we have 1000,000,000,000 tables but only some of the table event process occurs CatalogException and MetastoreNotificationNeedsInvalidateException was thrown. I think there is no need to invalidate all table caches. > Add the ability to update EventProcessorStatus > -- > > Key: IMPALA-12468 > URL: https://issues.apache.org/jira/browse/IMPALA-12468 > Project: IMPALA > Issue Type: Improvement > Components: be, Catalog, fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > Once the impala and hive's status is missmatched , and the > EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate > metadata to reset the catalog instance. And then impala will update the > status to ACTIVE . > But if impala contains many tables , the cost of invalidate is a bit high for > a global invalidate. So we may invalidate metadata for tables one by one for > these incremental changed table. For example , we have 1000,000,000,000 > tables but only some of the table event process occurs CatalogException and > MetastoreNotificationNeedsInvalidateException was thrown. I think there is no > need to invalidate all table caches in order to reset the catalog instance > see [here > |https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2088]. > > MetaStoresProcessor 's async update process will not update the currentStatus > when the status is not ACTIVE, see > [here|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L876] > So what about add a new SQL grammar : RESET STATUS status ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12468) Add the ability to update EventProcessorStatus
Maxwell Guo created IMPALA-12468: Summary: Add the ability to update EventProcessorStatus Key: IMPALA-12468 URL: https://issues.apache.org/jira/browse/IMPALA-12468 Project: IMPALA Issue Type: Improvement Components: be, Catalog, fe Reporter: Maxwell Guo Assignee: Maxwell Guo Once the impala and hive's status is missmatched , and the EventProcessorStatus become NEED_INVALIDATE, we usually use invalidate metadata to reset the catalog instance. And then impala will update the status to ACTIVE . But if impala contains many tables , the cost of invalidate is a bit high for a global invalidate. So we may invalidate metadata for tables one by one for these incremental changed table. For example , we have 1000,000,000,000 tables but only some of the table event process occurs CatalogException and MetastoreNotificationNeedsInvalidateException was thrown. I think there is no need to invalidate all table caches. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766811#comment-17766811 ] Maxwell Guo commented on IMPALA-12402: -- Seems final test failed with py test ~~~ > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766615#comment-17766615 ] Maxwell Guo commented on IMPALA-12402: -- so Now the status of this jira should be "needs committer" > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766175#comment-17766175 ] Maxwell Guo commented on IMPALA-12402: -- Hello, I have saw that [~MikaelSmith] have +1 on this patch , so do we need another committer to +1 on this before this patch can be checked in ? [~stigahuang] > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765415#comment-17765415 ] Maxwell Guo commented on IMPALA-12402: -- I have already modified the commit message and together with the merge conflict . [~MikaelSmith] > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Make CatalogdMetaProvider's cache concurrency level configurable
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Summary: Make CatalogdMetaProvider's cache concurrency level configurable (was: Add some configurations for CatalogdMetaProvider's cache_) > Make CatalogdMetaProvider's cache concurrency level configurable > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763986#comment-17763986 ] Maxwell Guo commented on IMPALA-12402: -- Sorry , this is my first time to use gerrit to push code. I have use the same Change-Id agagin. [~MikaelSmith] Thanks for reminding > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763589#comment-17763589 ] Maxwell Guo commented on IMPALA-12402: -- [~MikaelSmith]Thank you for your reiview, I have update the code agagin. > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762954#comment-17762954 ] Maxwell Guo commented on IMPALA-12402: -- [~MikaelSmith] thanks for your reply,I think it is better to make this param of guava cache's concurrencyLevel (also I may want to make more than this one param) configurable instand of the default value 4. for many tables I think the value should be more than 4 like 128 or 256. When we saw the jstack for impala at startup stage, we found the threads are all waitting for the lock. see https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L432 lower value will lead to thread contention . As in this cache ,the concurrency level can be use as the buckect number . So more buckect little thread contention I think(We assume that the values are random enough). > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762550#comment-17762550 ] Maxwell Guo commented on IMPALA-12402: -- [~stigahuang][~tmate] can you help to take a look at this little patch ? > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Attachment: (was: 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch) > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761917#comment-17761917 ] Maxwell Guo edited comment on IMPALA-12402 at 9/5/23 1:43 AM: -- Hello , can anyone help to review this little patch ? and another question is how can I run the pre-commit tests ? was (Author: maxwellguo): Hello , can anyone help to review this little patch ? > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761917#comment-17761917 ] Maxwell Guo commented on IMPALA-12402: -- Hello , can anyone help to review this little patch ? > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760202#comment-17760202 ] Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 7:23 AM: --- [gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review agagin now. [Build passed |https://jenkins.impala.io/job/gerrit-code-review-checks/13892/] was (Author: maxwellguo): [gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review agagin now. [Build passed https://jenkins.impala.io/job/gerrit-code-review-checks/13892/] > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760202#comment-17760202 ] Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 7:23 AM: --- [gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review agagin now. [Build passed https://jenkins.impala.io/job/gerrit-code-review-checks/13892/] was (Author: maxwellguo): [gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review agagin now. > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760202#comment-17760202 ] Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 7:03 AM: --- [gerrit link|http://gerrit.cloudera.org:8080/20443] and it is ready for review agagin now. was (Author: maxwellguo): [gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review agagin now. > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760202#comment-17760202 ] Maxwell Guo edited comment on IMPALA-12402 at 8/31/23 6:46 AM: --- [gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review agagin now. was (Author: maxwellguo): [gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review agagin now. > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760202#comment-17760202 ] Maxwell Guo edited comment on IMPALA-12402 at 8/30/23 2:30 AM: --- [gerrit link|https://gerrit.cloudera.org/#/c/20435/] and it is ready for review agagin now. was (Author: maxwellguo): [gerrit link|https://gerrit.cloudera.org/#/c/20435/] > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Language: java C++ (was: java) > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - [gerrit link|https://gerrit.cloudera.org/#/c/20435/] > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760189#comment-17760189 ] Maxwell Guo commented on IMPALA-12402: -- Thanks [~stigahuang] > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12402 started by Maxwell Guo. > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Attachment: (was: 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch) > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Attachment: 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Attachment: 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-IMPALA-12402-Add-some-configurations-for-CatalogdMet.patch > > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758821#comment-17758821 ] Maxwell Guo commented on IMPALA-12402: -- How can I assign this jira to myself ? I can't find the button. > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Flags: Patch Labels: pull-request-available (was: ) > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Priority: Minor > Labels: pull-request-available > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Summary: Add some configurations for CatalogdMetaProvider's cache_ (was: Add some configurations for CatalogMetaProvider's cache_) > Add some configurations for CatalogdMetaProvider's cache_ > - > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Priority: Minor > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12402) Add some configurations for CatalogMetaProvider's cache_
[ https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxwell Guo updated IMPALA-12402: - Language: java Target Version: Impala 4.2.0 > Add some configurations for CatalogMetaProvider's cache_ > > > Key: IMPALA-12402 > URL: https://issues.apache.org/jira/browse/IMPALA-12402 > Project: IMPALA > Issue Type: Improvement > Components: fe >Reporter: Maxwell Guo >Priority: Minor > > when the cluster contains many db and tables such as if there are more than > 10 tables, and if we restart the impalad , the local cache_ > CatalogMetaProvider's need to doing some loading process. > As we know that the goole's guava cache 's concurrencyLevel os set to 4 by > default. > but if there is many tables the loading process will need more time and > increase the probability of lock contention, see > [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. > > So we propose to add some configurations here, the first is the concurrency > of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12402) Add some configurations for CatalogMetaProvider's cache_
Maxwell Guo created IMPALA-12402: Summary: Add some configurations for CatalogMetaProvider's cache_ Key: IMPALA-12402 URL: https://issues.apache.org/jira/browse/IMPALA-12402 Project: IMPALA Issue Type: Improvement Components: fe Reporter: Maxwell Guo when the cluster contains many db and tables such as if there are more than 10 tables, and if we restart the impalad , the local cache_ CatalogMetaProvider's need to doing some loading process. As we know that the goole's guava cache 's concurrencyLevel os set to 4 by default. but if there is many tables the loading process will need more time and increase the probability of lock contention, see [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437]. So we propose to add some configurations here, the first is the concurrency of cache. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2761) Build and Run Impala on OS X
[ https://issues.apache.org/jira/browse/IMPALA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757304#comment-17757304 ] Maxwell Guo commented on IMPALA-2761: - any update here ? Besides on macos for intel, I think apple m1/m2 is needed too. > Build and Run Impala on OS X > > > Key: IMPALA-2761 > URL: https://issues.apache.org/jira/browse/IMPALA-2761 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Affects Versions: Impala 2.3.0 >Reporter: Martin Grund >Priority: Minor > Labels: osx > > This is an Umbrella Ticket to support building an running Impala on Mac OS X. > Comments will be used to keep track of the status. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org