[jira] [Commented] (HDFS-4089) SyncBehindWrites uses wrong flags on sync_file_range
[ https://issues.apache.org/jira/browse/HDFS-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484951#comment-13484951 ] Todd Lipcon commented on HDFS-4089: --- Sounds reasonable. Want to submit a patch? Please make sure you add the old config name as 'deprecated' here. SyncBehindWrites uses wrong flags on sync_file_range Key: HDFS-4089 URL: https://issues.apache.org/jira/browse/HDFS-4089 Project: Hadoop HDFS Issue Type: Bug Components: data-node, ha Reporter: Jan Kunigk Priority: Minor Attachments: syncBehindWrites.patch Hi, I stumbled upon this while trying to understand the append design recently. I am assuming when SyncBehindWrites is enabled we do indeed want to do a complete sync after each write. In that case the implementation seems wrong to me. Here's a comment from the manpage of sync_file_range on the usage of the SYNC_FILE_RANGE_WRITE flag in solitude: This is an asynchronous flush-to-disk operation. This is not suitable for data integrity operations. I don't know why this syscall is invoked here instead of just fsync -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4089) SyncBehindWrites uses wrong flags on sync_file_range
[ https://issues.apache.org/jira/browse/HDFS-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484557#comment-13484557 ] Jan Kunigk commented on HDFS-4089: -- Hi Todd, sorry the lag in my replies. I think we can improve the naming at least. 1) I would rename the variable from SyncBehindWrites to WriteOutDirtyOSPages. Sync is misleading, since, even though the syscall is named sync_file_range, the flags that it is used with here do not entail sync semantics, i.e. a synchronous, i.e. blocking operation. 2) I think we should mark the appropriate section in DFSConfigKeys.java as performance related This is more an issue of hygiene and, after your explanations, no longer a concern w/r to data integrity. Thanks. SyncBehindWrites uses wrong flags on sync_file_range Key: HDFS-4089 URL: https://issues.apache.org/jira/browse/HDFS-4089 Project: Hadoop HDFS Issue Type: Bug Components: data-node, ha Reporter: Jan Kunigk Priority: Minor Attachments: syncBehindWrites.patch Hi, I stumbled upon this while trying to understand the append design recently. I am assuming when SyncBehindWrites is enabled we do indeed want to do a complete sync after each write. In that case the implementation seems wrong to me. Here's a comment from the manpage of sync_file_range on the usage of the SYNC_FILE_RANGE_WRITE flag in solitude: This is an asynchronous flush-to-disk operation. This is not suitable for data integrity operations. I don't know why this syscall is invoked here instead of just fsync -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4089) SyncBehindWrites uses wrong flags on sync_file_range
[ https://issues.apache.org/jira/browse/HDFS-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481063#comment-13481063 ] Todd Lipcon commented on HDFS-4089: --- Hi Jan. The docs for this config say: {quote} If this configuration is enabled, the datanode will instruct the operating system to enqueue all written data to the disk immediately after it is written. This differs from the usual OS policy which may wait for up to 30 seconds before triggering writeback. This may improve performance for some workloads by smoothing the IO profile for data written to disk. If the Hadoop native libraries are not available, this configuration has no effect. {quote} Do you think we could improve it? In DFSConfigKeys.java, it's in a lower section than the HA-related config, though I agree we could add some more comment lines to clearly delineate the different config keys. SyncBehindWrites uses wrong flags on sync_file_range Key: HDFS-4089 URL: https://issues.apache.org/jira/browse/HDFS-4089 Project: Hadoop HDFS Issue Type: Bug Components: data-node, ha Reporter: Jan Kunigk Priority: Minor Attachments: syncBehindWrites.patch Hi, I stumbled upon this while trying to understand the append design recently. I am assuming when SyncBehindWrites is enabled we do indeed want to do a complete sync after each write. In that case the implementation seems wrong to me. Here's a comment from the manpage of sync_file_range on the usage of the SYNC_FILE_RANGE_WRITE flag in solitude: This is an asynchronous flush-to-disk operation. This is not suitable for data integrity operations. I don't know why this syscall is invoked here instead of just fsync -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4089) SyncBehindWrites uses wrong flags on sync_file_range
[ https://issues.apache.org/jira/browse/HDFS-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480422#comment-13480422 ] Todd Lipcon commented on HDFS-4089: --- Hi Jan. The purpose of this flag isn't for data integrity -- it's to avoid lumpy IO writeback. If you want data integrity you should be using the hsync() call after every write. SyncBehindWrites uses wrong flags on sync_file_range Key: HDFS-4089 URL: https://issues.apache.org/jira/browse/HDFS-4089 Project: Hadoop HDFS Issue Type: Bug Components: data-node, ha Reporter: Jan Kunigk Priority: Minor Attachments: syncBehindWrites.patch Hi, I stumbled upon this while trying to understand the append design recently. I am assuming when SyncBehindWrites is enabled we do indeed want to do a complete sync after each write. In that case the implementation seems wrong to me. Here's a comment from the manpage of sync_file_range on the usage of the SYNC_FILE_RANGE_WRITE flag in solitude: This is an asynchronous flush-to-disk operation. This is not suitable for data integrity operations. I don't know why this syscall is invoked here instead of just fsync -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira