[ 
https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459633#comment-15459633
 ] 

Manoj Govindassamy commented on HDFS-10830:
-------------------------------------------

[~arpitagarwal],

{quote}
  void waitVolumeRemoved(int sleepMillis, Condition condition) {
      .. .. ..
      try {
        condition.wait(sleepMillis);  <==
      } catch (InterruptedException e) {
        FsDatasetImpl.LOG.info("Thread interrupted when waiting for "
            + "volume reference to be released.");
        Thread.currentThread().interrupt();
      }
{quote}

# By calling Object monitor method wait() on {{Condition}} variable, it will be 
treated like a normal Object and will try to use its intrinsic monitor as 
against the Condition's wait-set instance (here {{datasetLockCondition}}). 
Copying below the note I read from java doc on Condition.

[Ref|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html]
{color:blue}
??Note that Condition instances are just normal objects and can themselves be 
used as the target in a synchronized statement, and can have their own monitor 
wait and notification methods invoked. Acquiring the monitor lock of a 
Condition instance, or using its monitor methods, has no specified relationship 
with acquiring the Lock associated with that Condition or the use of its 
waiting and signalling methods. It is recommended that to avoid confusion you 
never use Condition instances in this way, except perhaps within their own 
implementation.??
{color}

# So, the usage of condition.wait() here expects the Object monitor to be 
locked using synchronized() and since it is not locked, we will again get 
IllegalMonitorStateException.  Please let me know if my understanding is wrong.

> FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when 
> vol being removed is in use
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10830
>                 URL: https://issues.apache.org/jira/browse/HDFS-10830
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-10830.01.patch
>
>
> {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with 
> IllegalMonitorStateException whenever the volume being removed is in use 
> concurrently.
> Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is 
> FsDatasetImpl) which it has never locked, leading to  
> IllegalMonitorStateException. This monitor wait happens only the volume being 
> removed is in use (referencecount > 0). The thread performing this remove 
> volume operation thus crashes abruptly and block invalidations for the remove 
> volumes are totally skipped. 
> {code:title=FsDatasetImpl.java|borderStyle=solid}
> @Override
> public void removeVolumes(Set<File> volumesToRemove, boolean clearFailure) {
> ..
> ..
> try (AutoCloseableLock lock = datasetLock.acquire()) {   <== LOCK acquire 
> datasetLock
> for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) {
>   .. .. ..
>   asyncDiskService.removeVolume(sd.getCurrentDir());     <== volume SD1 remove
>   volumes.removeVolume(absRoot, clearFailure);
>   volumes.waitVolumeRemoved(5000, this);                 <== WAIT on "this" 
> ?? But, we haven't locked it yet.
>                                                              This will cause 
> IllegalMonitorStateException
>                                                              and crash 
> getBlockReports()/FBR thread!
>   for (String bpid : volumeMap.getBlockPoolList()) {
>     List<ReplicaInfo> blocks = new ArrayList<>();
>     for (Iterator<ReplicaInfo> it = volumeMap.replicas(bpid).iterator();
>          it.hasNext(); ) {
>         .. .. .. 
>         it.remove();                                     <== volumeMap removal
>       }
>     blkToInvalidate.put(bpid, blocks);
>   }
>  .. ..
> }                                                        <== LOCK release 
> datasetLock   
> // Call this outside the lock.
> for (Map.Entry<String, List<ReplicaInfo>> entry :
> blkToInvalidate.entrySet()) {
>  ..
>  for (ReplicaInfo block : blocks) {
>   invalidate(bpid, block);                               <== Notify NN of 
> Block removal
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to