[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI

2023-12-18 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798136#comment-17798136
 ] 

Martijn Visser commented on FLINK-33804:


[~qqibrow]  Please open a discussion with a proposal on the Dev ML for this

> Add Option to disable showing metrics in JobMananger UI
> ---
>
> Key: FLINK-33804
> URL: https://issues.apache.org/jira/browse/FLINK-33804
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Lu Niu
>Priority: Major
>
> Flink allows users to view metric in JobMananger UI. However there are 2 
> problems we found:
>  # The JobManager is required to aggregate metrics from all task managers. 
> When the metric cardinality is quite high, this process can trigger a 
> JobManager Full GC and slow response time.
>  # Flink user cases in prod usually have their own dashboard to view metrics. 
> so this feature sometimes is not useful.
> In light of this, we propose to add option to disable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI

2023-12-13 Thread Lu Niu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796474#comment-17796474
 ] 

Lu Niu commented on FLINK-33804:


the whitelist will be a subset of metrics listed here 
[https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricNames.java#L4]
 

> Add Option to disable showing metrics in JobMananger UI
> ---
>
> Key: FLINK-33804
> URL: https://issues.apache.org/jira/browse/FLINK-33804
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Lu Niu
>Priority: Major
>
> Flink allows users to view metric in JobMananger UI. However there are 2 
> problems we found:
>  # The JobManager is required to aggregate metrics from all task managers. 
> When the metric cardinality is quite high, this process can trigger a 
> JobManager Full GC and slow response time.
>  # Flink user cases in prod usually have their own dashboard to view metrics. 
> so this feature sometimes is not useful.
> In light of this, we propose to add option to disable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI

2023-12-13 Thread Lu Niu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796455#comment-17796455
 ] 

Lu Niu commented on FLINK-33804:


The problem we want to address:

In default setting, the JobManager is required to aggregate metrics from all 
task managers to power the metrics in UI. When the metric cardinality is quite 
high, this process can trigger a JobManager Full GC and slow response time. 

There are several options:
Option 1: The issue at hand can be mitigated by setting 
metrics.fetcher.update-interval=0. However, a problem arises in the JobManager 
UI where metrics like "Byte Received" keep loading indefinitely. This can lead 
to confusion for users.

Option 2: To address this, we can introduce a whitelist of metrics. 
Additionally, we can add an option that, when enabled, will only allow the 
selected metrics to report to the JobManager. This will ensure that the UI, 
including the overview page and subtask page, continues to function properly.

Option 3: An alternative approach is to follow a similar path as in option 2. 
However, instead of introducing a new feature flag, we can repurpose the 
existing metrics.fetcher.update-interval flag. When 
metrics.fetcher.update-interval is set to 0, the whitelist feature will be 
automatically activated.
 
 
 

> Add Option to disable showing metrics in JobMananger UI
> ---
>
> Key: FLINK-33804
> URL: https://issues.apache.org/jira/browse/FLINK-33804
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Lu Niu
>Priority: Major
>
> Flink allows users to view metric in JobMananger UI. However there are 2 
> problems we found:
>  # The JobManager is required to aggregate metrics from all task managers. 
> When the metric cardinality is quite high, this process can trigger a 
> JobManager Full GC and slow response time.
>  # Flink user cases in prod usually have their own dashboard to view metrics. 
> so this feature sometimes is not useful.
> In light of this, we propose to add option to disable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI

2023-12-13 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796131#comment-17796131
 ] 

Martijn Visser commented on FLINK-33804:


[~qqibrow] What is your proposal for the long term solution? Looking at 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/ we 
don't offer a lot of granularity to disable/enable specific features and I'm 
not sure that we should.

> Add Option to disable showing metrics in JobMananger UI
> ---
>
> Key: FLINK-33804
> URL: https://issues.apache.org/jira/browse/FLINK-33804
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Lu Niu
>Priority: Major
>
> Flink allows users to view metric in JobMananger UI. However there are 2 
> problems we found:
>  # The JobManager is required to aggregate metrics from all task managers. 
> When the metric cardinality is quite high, this process can trigger a 
> JobManager Full GC and slow response time.
>  # Flink user cases in prod usually have their own dashboard to view metrics. 
> so this feature sometimes is not useful.
> In light of this, we propose to add option to disable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI

2023-12-12 Thread Lu Niu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795945#comment-17795945
 ] 

Lu Niu commented on FLINK-33804:


[~martijnvisser] disabling the metric aggregation to JobMannger is the key.  We 
have jobs having the job mannager GC issue in production and we have a internal 
fix. want to contribute back the fix and align with community long term 
solution here. Can change the title accordingly if that's misleading. 

> Add Option to disable showing metrics in JobMananger UI
> ---
>
> Key: FLINK-33804
> URL: https://issues.apache.org/jira/browse/FLINK-33804
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Lu Niu
>Priority: Major
>
> Flink allows users to view metric in JobMananger UI. However there are 2 
> problems we found:
>  # The JobManager is required to aggregate metrics from all task managers. 
> When the metric cardinality is quite high, this process can trigger a 
> JobManager Full GC and slow response time.
>  # Flink user cases in prod usually have their own dashboard to view metrics. 
> so this feature sometimes is not useful.
> In light of this, we propose to add option to disable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI

2023-12-12 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795944#comment-17795944
 ] 

Martijn Visser commented on FLINK-33804:


I'm not too sure, because it's reads like having finer-grained possibilities to 
enable/disable certain options in the UI. We don't have that at this moment and 
starting to add those will mean that we at some point will have a lot of 
options, who possibly conflict with each other. If Flink users have their own 
dashboard, why not disable the UI completely?

> Add Option to disable showing metrics in JobMananger UI
> ---
>
> Key: FLINK-33804
> URL: https://issues.apache.org/jira/browse/FLINK-33804
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Lu Niu
>Priority: Major
>
> Flink allows users to view metric in JobMananger UI. However there are 2 
> problems we found:
>  # The JobManager is required to aggregate metrics from all task managers. 
> When the metric cardinality is quite high, this process can trigger a 
> JobManager Full GC and slow response time.
>  # Flink user cases in prod usually have their own dashboard to view metrics. 
> so this feature sometimes is not useful.
> In light of this, we propose to add option to disable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)