[ https://issues.apache.org/jira/browse/FLINK-17820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115851#comment-17115851 ]
Roman Khachatryan commented on FLINK-17820: ------------------------------------------- {quote}What do you think about adding a test that guards this assumption? {quote} I think we can not rely on testing DataOutputStream because the JRE could be different at runtime. {quote}We also have the {{DataOutputViewStreamWrapper}} class which is an extension of {{DataOutputStream}} - we could use that, guard it with a "does not buffer" test and if we ever find it actually buffers, then we need to implement the methods directly, rather than inherit from {{DataOutputStream}}. {quote} I guess you mean adding a no-op flush method to {{DataOutputViewStreamWrapper.}}I think it will be difficult to detect that some flush calls "don't work" anymore. And probably it's better to find incorrect assumptions. {quote}Wrapping also seems like unnecessary complexity {quote} To me, it is less complicated than baking (counter-intuitive?) assumptions into public code and adding tests for it > Memory threshold is ignored for channel state > --------------------------------------------- > > Key: FLINK-17820 > URL: https://issues.apache.org/jira/browse/FLINK-17820 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Task > Affects Versions: 1.11.0 > Reporter: Roman Khachatryan > Assignee: Roman Khachatryan > Priority: Critical > Labels: pull-request-available > Fix For: 1.11.0 > > > Config parameter state.backend.fs.memory-threshold is ignored for channel > state. Causing each subtask to have a file per checkpoint. Regardless of the > size of channel state (of this subtask). > This also causes slow cleanup and delays the next checkpoint. > > The problem is that {{ChannelStateCheckpointWriter.finishWriteAndResult}} > calls flush(); which actually flushes the data on disk. > > From FSDataOutputStream.flush Javadoc: > A completed flush does not mean that the data is necessarily persistent. Data > persistence can is only assumed after calls to close() or sync(). > > Possible solutions: > 1. not to flush in {{ChannelStateCheckpointWriter.finishWriteAndResult (which > can lead to data loss in a wrapping stream).}} > {{2. change }}{{FsCheckpointStateOutputStream.flush behavior}} > {{3. wrap }}{{FsCheckpointStateOutputStream to prevent flush}}{{}}{{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)