[ https://issues.apache.org/jira/browse/FLINK-21833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lynn1.zhang updated FLINK-21833: -------------------------------- Description: In our company cases, short-life-cycle & huge RowData will use this operator to join each other. Every key called session_id will be expired after 2min. With idle.state.retention.time configuration, after 2min, the leftState and rightState will be cleaned up by the operator but nextLeftIndex & registeredTimer state data will be stored forever. After running a day(about 20 million session_id in our cases ), the checkpoint operator will cause the job crash. I have found the bug, and fixed it. !image-2021-03-17-11-06-21-768.png! was: In our company cases, short-life-cycle & huge RowData will use this operator to join each other. Every key called session_id will be expired after 2min. With idle.state.retention.time configuration, after 2min, the leftState and rightState will be cleaned up by the operator but nextLeftIndex & registeredTimer will be stored forever. After running a day(about 20 million session_id in our cases ), the checkpoint operator will cause the job crash. I have found the bug, and fixed it. !image-2021-03-17-11-06-21-768.png! > TemporalRowTimeJoinOperator.java will lead to the state expansion by > short-life-cycle & huge RowData, although config idle.state.retention.time > ----------------------------------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-21833 > URL: https://issues.apache.org/jira/browse/FLINK-21833 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime > Affects Versions: 1.12.2 > Reporter: lynn1.zhang > Priority: Major > Labels: pull-request-available > Attachments: image-2021-03-17-11-06-21-768.png > > > In our company cases, short-life-cycle & huge RowData will use this operator > to join each other. Every key called session_id will be expired after 2min. > With idle.state.retention.time configuration, after 2min, the leftState and > rightState will be cleaned up by the operator but nextLeftIndex & > registeredTimer state data will be stored forever. > After running a day(about 20 million session_id in our cases ), the > checkpoint operator will cause the job crash. > I have found the bug, and fixed it. > !image-2021-03-17-11-06-21-768.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)