[ 
https://issues.apache.org/jira/browse/FLINK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796455#comment-17796455
 ] 

Lu Niu edited comment on FLINK-33804 at 12/13/23 8:36 PM:
----------------------------------------------------------

[~martijnvisser] 
The problem we want to address:

In default setting, the JobManager is required to aggregate metrics from all 
task managers to power the metrics in UI. When the metric cardinality is quite 
high, this process can trigger a JobManager Full GC and slow response time. 

There are several options:
Option 1: The issue at hand can be mitigated by setting 
metrics.fetcher.update-interval=0. However, a problem arises in the JobManager 
UI where metrics like "Byte Received" keep loading indefinitely. This can lead 
to confusion for users.

Option 2: To address this, we can introduce a whitelist of metrics. 
Additionally, we can add an option that, when enabled, will only allow the 
selected metrics to report to the JobManager. This will ensure that the UI, 
including the overview page and subtask page, continues to function properly.

Option 3: An alternative approach is to follow a similar path as in option 2. 
However, instead of introducing a new feature flag, we can repurpose the 
existing metrics.fetcher.update-interval flag. When 
metrics.fetcher.update-interval is set to 0, the whitelist feature will be 
automatically activated.
 
 
 


was (Author: qqibrow):
The problem we want to address:

In default setting, the JobManager is required to aggregate metrics from all 
task managers to power the metrics in UI. When the metric cardinality is quite 
high, this process can trigger a JobManager Full GC and slow response time. 

There are several options:
Option 1: The issue at hand can be mitigated by setting 
metrics.fetcher.update-interval=0. However, a problem arises in the JobManager 
UI where metrics like "Byte Received" keep loading indefinitely. This can lead 
to confusion for users.

Option 2: To address this, we can introduce a whitelist of metrics. 
Additionally, we can add an option that, when enabled, will only allow the 
selected metrics to report to the JobManager. This will ensure that the UI, 
including the overview page and subtask page, continues to function properly.

Option 3: An alternative approach is to follow a similar path as in option 2. 
However, instead of introducing a new feature flag, we can repurpose the 
existing metrics.fetcher.update-interval flag. When 
metrics.fetcher.update-interval is set to 0, the whitelist feature will be 
automatically activated.
 
 
 

> Add Option to disable showing metrics in JobMananger UI
> -------------------------------------------------------
>
>                 Key: FLINK-33804
>                 URL: https://issues.apache.org/jira/browse/FLINK-33804
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics
>            Reporter: Lu Niu
>            Priority: Major
>
> Flink allows users to view metric in JobMananger UI. However there are 2 
> problems we found:
>  # The JobManager is required to aggregate metrics from all task managers. 
> When the metric cardinality is quite high, this process can trigger a 
> JobManager Full GC and slow response time.
>  # Flink user cases in prod usually have their own dashboard to view metrics. 
> so this feature sometimes is not useful.
> In light of this, we propose to add option to disable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to