[jira] [Created] (FLINK-13856) Reduce the delete file api when the checkpoint is completed
andrew.D.lin created FLINK-13856: Summary: Reduce the delete file api when the checkpoint is completed Key: FLINK-13856 URL: https://issues.apache.org/jira/browse/FLINK-13856 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.9.0, 1.8.1 Reporter: andrew.D.lin Attachments: f6cc56b7-2c74-4f4b-bb6a-476d28a22096.png When the new checkpoint is completed, an old checkpoint will be deleted by calling CompletedCheckpoint.discardOnSubsume(). When deleting old checkpoints, follow these steps: 1, drop the metadata 2, discard private state objects 3, discard location as a whole In some cases, is it possible to delete the checkpoint folder recursively by one call? As far as I know the full amount of checkpoint, it should be possible to delete the folder directly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-15307) Subclasses of FailoverStrategy are easily confused with implementation classes of RestartStrategy
andrew.D.lin created FLINK-15307: Summary: Subclasses of FailoverStrategy are easily confused with implementation classes of RestartStrategy Key: FLINK-15307 URL: https://issues.apache.org/jira/browse/FLINK-15307 Project: Flink Issue Type: Improvement Components: Runtime / Configuration Affects Versions: 1.9.1, 1.9.0, 1.10.0 Reporter: andrew.D.lin Attachments: image-2019-12-18-14-59-03-181.png Subclasses of RestartStrategy * FailingRestartStrategy * FailureRateRestartStrategy * FixedDelayRestartStrategy * InfiniteDelayRestartStrategy Implementation class of FailoverStrategy * AdaptedRestartPipelinedRegionStrategyNG * RestartAllStrategy * RestartIndividualStrategy * RestartPipelinedRegionStrategy FailoverStrategy describes how the job computation recovers from task failures. I think the following names may be easier to understand and easier to distinguish: Implementation class of FailoverStrategy * AdaptedPipelinedRegionFailoverStrategyNG * FailoverAllStrategy * FailoverIndividualStrategy * FailoverPipelinedRegionStrategy FailoverStrategy is currently generated by configuration. If we change the name of the implementation class, it will not affect compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15568) RestartPipelinedRegionStrategy: not ensure the EXACTLY_ONCE semantics
Andrew.D.lin created FLINK-15568: Summary: RestartPipelinedRegionStrategy: not ensure the EXACTLY_ONCE semantics Key: FLINK-15568 URL: https://issues.apache.org/jira/browse/FLINK-15568 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Affects Versions: 1.8.3, 1.8.1, 1.8.0 Reporter: Andrew.D.lin Attachments: image-2020-01-13-16-40-47-888.png !image-2020-01-13-16-40-47-888.png! In 1.8* versions, FailoverRegion.java restart method not restore from latest checkpoint and marked TODO. Should we support this feature (region restart) in flink 1.8? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21858) TaskMetricGroup taskName is too long, especially in sql tasks.
Andrew.D.lin created FLINK-21858: Summary: TaskMetricGroup taskName is too long, especially in sql tasks. Key: FLINK-21858 URL: https://issues.apache.org/jira/browse/FLINK-21858 Project: Flink Issue Type: Improvement Components: Runtime / Metrics Affects Versions: 1.12.2, 1.12.1, 1.12.0 Reporter: Andrew.D.lin Now operatorName is limited to 80 by org.apache.flink.runtime.metrics.groups.TaskMetricGroup#METRICS_OPERATOR_NAME_MAX_LENGTH. So propose to limit the maximum length of metric name by configuration. Here is an example: "taskName":"GlobalGroupAggregate(groupBy=[dt, src, src1, src2, src3, ct1, ct2], select=[dt, src, src1, src2, src3, ct1, ct2, SUM_RETRACT((sum$0, count$1)) AS sx_pv, SUM_RETRACT((sum$2, count$3)) AS sx_uv, MAX_RETRACT(max$4) AS updt_time, MAX_RETRACT(max$5) AS time_id]) -> Calc(select=[((MD5((dt CONCAT _UTF-16LE'|' CONCAT src CONCAT _UTF-16LE'|' CONCAT src1 CONCAT _UTF-16LE'|' CONCAT src2 CONCAT _UTF-16LE'|' CONCAT src3 CONCAT _UTF-16LE'|' CONCAT ct1 CONCAT _UTF-16LE'|' CONCAT ct2 CONCAT _UTF-16LE'|' CONCAT time_id)) SUBSTR 1 SUBSTR 2) CONCAT _UTF-16LE'_' CONCAT (dt CONCAT _UTF-16LE'|' CONCAT src CONCAT _UTF-16LE'|' CONCAT src1 CONCAT _UTF-16LE'|' CONCAT src2 CONCAT _UTF-16LE'|' CONCAT src3 CONCAT _UTF-16LE'|' CONCAT ct1 CONCAT _UTF-16LE'|' CONCAT ct2 CONCAT _UTF-16LE'|' CONCAT time_id)) AS rowkey, sx_pv, sx_uv, updt_time]) -> LocalGroupAggregate(groupBy=[rowkey], select=[rowkey, MAX_RETRACT(sx_pv) AS max$0, MAX_RETRACT(sx_uv) AS max$1, MAX_RETRACT(updt_time) AS max$2, COUNT_RETRACT(*) AS count1$3])" "operatorName":"GlobalGroupAggregate(groupBy=[dt, src, src1, src2, src3, ct1, ct2], selec=[dt, s" -- This message was sent by Atlassian Jira (v8.3.4#803005)