[jira] [Commented] (FLINK-17820) Memory threshold is ignored for channel state

Stephan Ewen (Jira) Mon, 25 May 2020 10:04:49 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116150#comment-17116150
 ]


Stephan Ewen commented on FLINK-17820:
--------------------------------------

It is true, the assumption that {{DataOutputStream}} does not buffer is a 
fragile point, even if an unlikely one (my feeling is too much existing code 
would be broken if one SDK made the implementation behave differently for such 
a core class). That said, it should be possible to flush the stream without 
voiding the usability of the class.

If from the initial options, we don't like (1), then, thinking about it a bit 
more, I would go for option (2) (adjust {{FsCheckpointStateOutputStream}}).

I did a quick search through the code and it looks like we can drop the 
assumption that {{FsCheckpointStateOutputStream}} creates the file in 
{{flush()}} It seems not used in the production code (though possibly in 
tests). How about this?
  - rename the {{flush()}} method to {{flushToFile()}}, including all existing 
calls to that method within the class and in relevant tests.
  - override {{flush()}} as a no-op.


> Memory threshold is ignored for channel state
> ---------------------------------------------
>
>                 Key: FLINK-17820
>                 URL: https://issues.apache.org/jira/browse/FLINK-17820
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / Task
>    Affects Versions: 1.11.0
>            Reporter: Roman Khachatryan
>            Assignee: Roman Khachatryan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>
> Config parameter state.backend.fs.memory-threshold is ignored for channel 
> state. Causing each subtask to have a file per checkpoint. Regardless of the 
> size of channel state (of this subtask).
> This also causes slow cleanup and delays the next checkpoint.
>  
> The problem is that {{ChannelStateCheckpointWriter.finishWriteAndResult}} 
> calls flush(); which actually flushes the data on disk.
>  
> From FSDataOutputStream.flush Javadoc:
> A completed flush does not mean that the data is necessarily persistent. Data 
> persistence can is only assumed after calls to close() or sync().
>  
> Possible solutions:
> 1. not to flush in {{ChannelStateCheckpointWriter.finishWriteAndResult (which 
> can lead to data loss in a wrapping stream).}}
> {{2. change }}{{FsCheckpointStateOutputStream.flush behavior}}
> {{3. wrap }}{{FsCheckpointStateOutputStream to prevent flush}}{{}}{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17820) Memory threshold is ignored for channel state

Reply via email to