线上运行了多个flink sql作业,现在想监控端到端延迟。我配置了 metrics.latency.interval=30000 metrics.latency.granularity=operator metrics.latency.history-size=128 参数,延迟指标已经发到了prometheus,看到该指标有50、75、95、98,99,999分位线,另外还有operator_id和operator_id_subtask_index,细到了算子子task级别。 1. 想知道怎样根据这些暴露指标统计出该flink sql作业的端到端延迟分位线?是需要把所有同一个job的同一个算子同一分位值取平均再把不同算子得到的值相加么? 2. 另外,我们大部分sql作业都是从kafka接入的,消息格式是canal json,想进一步统计canal json中的binlog发生时间与kafka消息metadata里的timestamp时间差 和 kafka消息metadata里的timestamp与flink开始处理该消息的时间差,请问有办法不修改flink源码获取吗?
| flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="tb-bipusr-outcome-bank-record-binlog2mongo", component="taskmanager", host="172_19_193_104", instance="172.19.193.104:9249", job="kubernetes-pods", job_id="2ea0a87e69f0d485859a9108d595dd8d", job_name="tb_bipusr_outcome_bank_record_binlog2mongo", kubernetes_namespace="bfj", kubernetes_pod_name="tb-bipusr-outcome-bank-record-binlog2mongo-taskmanager-1-8", operator_id="570f707193e0fe32f4d86d067aba243b", operator_subtask_index="2", quantile="0.95", source_id="cbc357ccb763df2852fee8c4fc7d55f2", tm_id="tb_bipusr_outcome_bank_record_binlog2mongo_taskmanager_1_8", type="flink-native-kubernetes"} | 11.999999999999943 | | flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="tb-bipusr-outcome-bank-record-binlog2mongo", component="taskmanager", host="172_19_193_104", instance="172.19.193.104:9249", job="kubernetes-pods", job_id="2ea0a87e69f0d485859a9108d595dd8d", job_name="tb_bipusr_outcome_bank_record_binlog2mongo", kubernetes_namespace="bfj", kubernetes_pod_name="tb-bipusr-outcome-bank-record-binlog2mongo-taskmanager-1-8", operator_id="570f707193e0fe32f4d86d067aba243b", operator_subtask_index="2", quantile="0.98", source_id="cbc357ccb763df2852fee8c4fc7d55f2", tm_id="tb_bipusr_outcome_bank_record_binlog2mongo_taskmanager_1_8", type="flink-native-kubernetes"} | 21 | | flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="tb-bipusr-outcome-bank-record-binlog2mongo", component="taskmanager", host="172_19_193_104", instance="172.19.193.104:9249", job="kubernetes-pods", job_id="2ea0a87e69f0d485859a9108d595dd8d", job_name="tb_bipusr_outcome_bank_record_binlog2mongo", kubernetes_namespace="bfj", kubernetes_pod_name="tb-bipusr-outcome-bank-record-binlog2mongo-taskmanager-1-8", operator_id="570f707193e0fe32f4d86d067aba243b", operator_subtask_index="2", quantile="0.99", source_id="cbc357ccb763df2852fee8c4fc7d55f2", tm_id="tb_bipusr_outcome_bank_record_binlog2mongo_taskmanager_1_8", type="flink-native-kubernetes"} |