线上运行了多个flink sql作业,现在想监控端到端延迟。我配置了
metrics.latency.interval=30000
metrics.latency.granularity=operator
metrics.latency.history-size=128
参数,延迟指标已经发到了prometheus,看到该指标有50、75、95、98,99,999分位线,另外还有operator_id和operator_id_subtask_index,细到了算子子task级别。
1. 想知道怎样根据这些暴露指标统计出该flink 
sql作业的端到端延迟分位线?是需要把所有同一个job的同一个算子同一分位值取平均再把不同算子得到的值相加么?
2. 另外,我们大部分sql作业都是从kafka接入的,消息格式是canal json,想进一步统计canal 
json中的binlog发生时间与kafka消息metadata里的timestamp时间差 和 
kafka消息metadata里的timestamp与flink开始处理该消息的时间差,请问有办法不修改flink源码获取吗?


|
flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="tb-bipusr-outcome-bank-record-binlog2mongo",
 component="taskmanager", host="172_19_193_104", 
instance="172.19.193.104:9249", job="kubernetes-pods", 
job_id="2ea0a87e69f0d485859a9108d595dd8d", 
job_name="tb_bipusr_outcome_bank_record_binlog2mongo", 
kubernetes_namespace="bfj", 
kubernetes_pod_name="tb-bipusr-outcome-bank-record-binlog2mongo-taskmanager-1-8",
 operator_id="570f707193e0fe32f4d86d067aba243b", operator_subtask_index="2", 
quantile="0.95", source_id="cbc357ccb763df2852fee8c4fc7d55f2", 
tm_id="tb_bipusr_outcome_bank_record_binlog2mongo_taskmanager_1_8", 
type="flink-native-kubernetes"}
| 11.999999999999943 |
|
flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="tb-bipusr-outcome-bank-record-binlog2mongo",
 component="taskmanager", host="172_19_193_104", 
instance="172.19.193.104:9249", job="kubernetes-pods", 
job_id="2ea0a87e69f0d485859a9108d595dd8d", 
job_name="tb_bipusr_outcome_bank_record_binlog2mongo", 
kubernetes_namespace="bfj", 
kubernetes_pod_name="tb-bipusr-outcome-bank-record-binlog2mongo-taskmanager-1-8",
 operator_id="570f707193e0fe32f4d86d067aba243b", operator_subtask_index="2", 
quantile="0.98", source_id="cbc357ccb763df2852fee8c4fc7d55f2", 
tm_id="tb_bipusr_outcome_bank_record_binlog2mongo_taskmanager_1_8", 
type="flink-native-kubernetes"}
| 21 |
|
flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="tb-bipusr-outcome-bank-record-binlog2mongo",
 component="taskmanager", host="172_19_193_104", 
instance="172.19.193.104:9249", job="kubernetes-pods", 
job_id="2ea0a87e69f0d485859a9108d595dd8d", 
job_name="tb_bipusr_outcome_bank_record_binlog2mongo", 
kubernetes_namespace="bfj", 
kubernetes_pod_name="tb-bipusr-outcome-bank-record-binlog2mongo-taskmanager-1-8",
 operator_id="570f707193e0fe32f4d86d067aba243b", operator_subtask_index="2", 
quantile="0.99", source_id="cbc357ccb763df2852fee8c4fc7d55f2", 
tm_id="tb_bipusr_outcome_bank_record_binlog2mongo_taskmanager_1_8", 
type="flink-native-kubernetes"}
|

回复