[jira] [Commented] (HDFS-1790) review use of synchronized methods in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011753#comment-13011753 ] dhruba borthakur commented on HDFS-1790: incVolumeFailure(), isInSafeMode() and isPopulatingReplQueues() should use the FSNamesystem read/write lock. > review use of synchronized methods in FSNamesystem > -- > > Key: HDFS-1790 > URL: https://issues.apache.org/jira/browse/HDFS-1790 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Matt Foley > Fix For: 0.23.0 > > > While analyzing SafeMode semantics for HDFS-1726, I noticed the following > usage of synchronized methods in FSNamesystem that raised questions: > 1. incVolumeFailure() is synchronized on the FSNamesystem instance (monitor > lock), for unclear reasons. Is this a left-over from the conversion to > read/write lock in FSNamesystem? > 2. isInSafeMode() and isPopulatingReplQueues() are also synchronized on the > FSNamesystem instance (monitor lock), but the SafeModeInfo methods they call > (safeMode.isInSafeMode() and safeMode.isPopulatingReplQueues() are > synchronized on the SafeModeInfo instance. Is synchronizing on the > FSNamesystem instance necessary? What does it add? If it is necessary, > should it be using the read/write lock? > 3. In SafeModeInfo, these methods are synchronized on the SafeModeInfo > instance: > - isOn > - isPopulatingReplQueues > - leave > - initialize > - canInitializeReplQueues > - canLeave > - setBlock > - incrementSafeBlockCount > - decrementSafeBlockCount > - setManual > but these are not: > - enter > - needEnter > - checkMode > - isManual > - isConsistent > Regarding these: > - isOn() asserts isConsistent(), but is otherwise an atomic read operation. > - isPopulatingReplQueues() is an atomic read. Does it need synchronization? > - leave() is complex, but shouldn't it be synchronized with enter(), which is > also a write operation? Yet enter() is unsynchronized. > - initializeReplQueues() calls blockManager.processMisReplicatedBlocks(), > which can take minutes to run. By synchronizing on the SafeModeInfo > instance, it prevents essentially all of the other safeMode methods from > running for the duration! Is this desirable or needed? > - canInitializeReplQueues() is a read-only operation, although it does > compare two read values. needEnter() compares four read values, and is > unsynchronized. Does either need synchronization? > - canLeave() is a compound operation, it's good that it is synchronized. > - checkMode() is a big compound read/write operation. Doesn't it need > synchronization if the other methods do? > and so on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1790) review use of synchronized methods in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011673#comment-13011673 ] Matt Foley commented on HDFS-1790: -- SafeModeInfo contains several numeric variables which are overloaded with "in manual" or "in extension" or "not actually in safemode" semantics based on special values -- instead of just having a "state" variable (which I'm now providing). Consequently, these variables may need to be read/written in a synchronized way. So it isn't unreasonable to synchronize them on the SafeModeInfo instance. But it seems to me that it is being done inconsistently, as described above. Do I understand you correctly, that you agree incVolumeFailure(), isInSafeMode() and isPopulatingReplQueues(), should use the read/write lock rather than synchronizing on the FSNamesystem instance? Thanks. > review use of synchronized methods in FSNamesystem > -- > > Key: HDFS-1790 > URL: https://issues.apache.org/jira/browse/HDFS-1790 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Matt Foley > Fix For: 0.23.0 > > > While analyzing SafeMode semantics for HDFS-1726, I noticed the following > usage of synchronized methods in FSNamesystem that raised questions: > 1. incVolumeFailure() is synchronized on the FSNamesystem instance (monitor > lock), for unclear reasons. Is this a left-over from the conversion to > read/write lock in FSNamesystem? > 2. isInSafeMode() and isPopulatingReplQueues() are also synchronized on the > FSNamesystem instance (monitor lock), but the SafeModeInfo methods they call > (safeMode.isInSafeMode() and safeMode.isPopulatingReplQueues() are > synchronized on the SafeModeInfo instance. Is synchronizing on the > FSNamesystem instance necessary? What does it add? If it is necessary, > should it be using the read/write lock? > 3. In SafeModeInfo, these methods are synchronized on the SafeModeInfo > instance: > - isOn > - isPopulatingReplQueues > - leave > - initialize > - canInitializeReplQueues > - canLeave > - setBlock > - incrementSafeBlockCount > - decrementSafeBlockCount > - setManual > but these are not: > - enter > - needEnter > - checkMode > - isManual > - isConsistent > Regarding these: > - isOn() asserts isConsistent(), but is otherwise an atomic read operation. > - isPopulatingReplQueues() is an atomic read. Does it need synchronization? > - leave() is complex, but shouldn't it be synchronized with enter(), which is > also a write operation? Yet enter() is unsynchronized. > - initializeReplQueues() calls blockManager.processMisReplicatedBlocks(), > which can take minutes to run. By synchronizing on the SafeModeInfo > instance, it prevents essentially all of the other safeMode methods from > running for the duration! Is this desirable or needed? > - canInitializeReplQueues() is a read-only operation, although it does > compare two read values. needEnter() compares four read values, and is > unsynchronized. Does either need synchronization? > - canLeave() is a compound operation, it's good that it is synchronized. > - checkMode() is a big compound read/write operation. Doesn't it need > synchronization if the other methods do? > and so on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1790) review use of synchronized methods in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011564#comment-13011564 ] dhruba borthakur commented on HDFS-1790: The setting of safemode should be done inside the FSNamesystem write lock. The checking of safemode should be done inside the FSNamesystem read lock. But can you explain why the SafeMode object needs its own "synchronization"? > review use of synchronized methods in FSNamesystem > -- > > Key: HDFS-1790 > URL: https://issues.apache.org/jira/browse/HDFS-1790 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Matt Foley > Fix For: 0.23.0 > > > While analyzing SafeMode semantics for HDFS-1726, I noticed the following > usage of synchronized methods in FSNamesystem that raised questions: > 1. incVolumeFailure() is synchronized on the FSNamesystem instance (monitor > lock), for unclear reasons. Is this a left-over from the conversion to > read/write lock in FSNamesystem? > 2. isInSafeMode() and isPopulatingReplQueues() are also synchronized on the > FSNamesystem instance (monitor lock), but the SafeModeInfo methods they call > (safeMode.isInSafeMode() and safeMode.isPopulatingReplQueues() are > synchronized on the SafeModeInfo instance. Is synchronizing on the > FSNamesystem instance necessary? What does it add? If it is necessary, > should it be using the read/write lock? > 3. In SafeModeInfo, these methods are synchronized on the SafeModeInfo > instance: > - isOn > - isPopulatingReplQueues > - leave > - initialize > - canInitializeReplQueues > - canLeave > - setBlock > - incrementSafeBlockCount > - decrementSafeBlockCount > - setManual > but these are not: > - enter > - needEnter > - checkMode > - isManual > - isConsistent > Regarding these: > - isOn() asserts isConsistent(), but is otherwise an atomic read operation. > - isPopulatingReplQueues() is an atomic read. Does it need synchronization? > - leave() is complex, but shouldn't it be synchronized with enter(), which is > also a write operation? Yet enter() is unsynchronized. > - initializeReplQueues() calls blockManager.processMisReplicatedBlocks(), > which can take minutes to run. By synchronizing on the SafeModeInfo > instance, it prevents essentially all of the other safeMode methods from > running for the duration! Is this desirable or needed? > - canInitializeReplQueues() is a read-only operation, although it does > compare two read values. needEnter() compares four read values, and is > unsynchronized. Does either need synchronization? > - canLeave() is a compound operation, it's good that it is synchronized. > - checkMode() is a big compound read/write operation. Doesn't it need > synchronization if the other methods do? > and so on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira