[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI
[ https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798136#comment-17798136 ] Martijn Visser commented on FLINK-33804: [~qqibrow] Please open a discussion with a proposal on the Dev ML for this > Add Option to disable showing metrics in JobMananger UI > --- > > Key: FLINK-33804 > URL: https://issues.apache.org/jira/browse/FLINK-33804 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Lu Niu >Priority: Major > > Flink allows users to view metric in JobMananger UI. However there are 2 > problems we found: > # The JobManager is required to aggregate metrics from all task managers. > When the metric cardinality is quite high, this process can trigger a > JobManager Full GC and slow response time. > # Flink user cases in prod usually have their own dashboard to view metrics. > so this feature sometimes is not useful. > In light of this, we propose to add option to disable this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI
[ https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796474#comment-17796474 ] Lu Niu commented on FLINK-33804: the whitelist will be a subset of metrics listed here [https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricNames.java#L4] > Add Option to disable showing metrics in JobMananger UI > --- > > Key: FLINK-33804 > URL: https://issues.apache.org/jira/browse/FLINK-33804 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Lu Niu >Priority: Major > > Flink allows users to view metric in JobMananger UI. However there are 2 > problems we found: > # The JobManager is required to aggregate metrics from all task managers. > When the metric cardinality is quite high, this process can trigger a > JobManager Full GC and slow response time. > # Flink user cases in prod usually have their own dashboard to view metrics. > so this feature sometimes is not useful. > In light of this, we propose to add option to disable this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI
[ https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796455#comment-17796455 ] Lu Niu commented on FLINK-33804: The problem we want to address: In default setting, the JobManager is required to aggregate metrics from all task managers to power the metrics in UI. When the metric cardinality is quite high, this process can trigger a JobManager Full GC and slow response time. There are several options: Option 1: The issue at hand can be mitigated by setting metrics.fetcher.update-interval=0. However, a problem arises in the JobManager UI where metrics like "Byte Received" keep loading indefinitely. This can lead to confusion for users. Option 2: To address this, we can introduce a whitelist of metrics. Additionally, we can add an option that, when enabled, will only allow the selected metrics to report to the JobManager. This will ensure that the UI, including the overview page and subtask page, continues to function properly. Option 3: An alternative approach is to follow a similar path as in option 2. However, instead of introducing a new feature flag, we can repurpose the existing metrics.fetcher.update-interval flag. When metrics.fetcher.update-interval is set to 0, the whitelist feature will be automatically activated. > Add Option to disable showing metrics in JobMananger UI > --- > > Key: FLINK-33804 > URL: https://issues.apache.org/jira/browse/FLINK-33804 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Lu Niu >Priority: Major > > Flink allows users to view metric in JobMananger UI. However there are 2 > problems we found: > # The JobManager is required to aggregate metrics from all task managers. > When the metric cardinality is quite high, this process can trigger a > JobManager Full GC and slow response time. > # Flink user cases in prod usually have their own dashboard to view metrics. > so this feature sometimes is not useful. > In light of this, we propose to add option to disable this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI
[ https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796131#comment-17796131 ] Martijn Visser commented on FLINK-33804: [~qqibrow] What is your proposal for the long term solution? Looking at https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/ we don't offer a lot of granularity to disable/enable specific features and I'm not sure that we should. > Add Option to disable showing metrics in JobMananger UI > --- > > Key: FLINK-33804 > URL: https://issues.apache.org/jira/browse/FLINK-33804 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Lu Niu >Priority: Major > > Flink allows users to view metric in JobMananger UI. However there are 2 > problems we found: > # The JobManager is required to aggregate metrics from all task managers. > When the metric cardinality is quite high, this process can trigger a > JobManager Full GC and slow response time. > # Flink user cases in prod usually have their own dashboard to view metrics. > so this feature sometimes is not useful. > In light of this, we propose to add option to disable this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI
[ https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795945#comment-17795945 ] Lu Niu commented on FLINK-33804: [~martijnvisser] disabling the metric aggregation to JobMannger is the key. We have jobs having the job mannager GC issue in production and we have a internal fix. want to contribute back the fix and align with community long term solution here. Can change the title accordingly if that's misleading. > Add Option to disable showing metrics in JobMananger UI > --- > > Key: FLINK-33804 > URL: https://issues.apache.org/jira/browse/FLINK-33804 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Lu Niu >Priority: Major > > Flink allows users to view metric in JobMananger UI. However there are 2 > problems we found: > # The JobManager is required to aggregate metrics from all task managers. > When the metric cardinality is quite high, this process can trigger a > JobManager Full GC and slow response time. > # Flink user cases in prod usually have their own dashboard to view metrics. > so this feature sometimes is not useful. > In light of this, we propose to add option to disable this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI
[ https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795944#comment-17795944 ] Martijn Visser commented on FLINK-33804: I'm not too sure, because it's reads like having finer-grained possibilities to enable/disable certain options in the UI. We don't have that at this moment and starting to add those will mean that we at some point will have a lot of options, who possibly conflict with each other. If Flink users have their own dashboard, why not disable the UI completely? > Add Option to disable showing metrics in JobMananger UI > --- > > Key: FLINK-33804 > URL: https://issues.apache.org/jira/browse/FLINK-33804 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Lu Niu >Priority: Major > > Flink allows users to view metric in JobMananger UI. However there are 2 > problems we found: > # The JobManager is required to aggregate metrics from all task managers. > When the metric cardinality is quite high, this process can trigger a > JobManager Full GC and slow response time. > # Flink user cases in prod usually have their own dashboard to view metrics. > so this feature sometimes is not useful. > In light of this, we propose to add option to disable this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)