[ 
https://issues.apache.org/jira/browse/HDFS-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ade updated HDFS-10605:
-----------------------
    Description: 
The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran 
successful, but deadlock like HDFS-9874 maybe happen online.
* UT: 
{code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
    dn.data = Mockito.spy(data);
    LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" + 
dn.data.hashCode());
    doAnswer(new Answer<Object>() {
          public Object answer(InvocationOnMock invocation)
              throws IOException, InterruptedException {
            Thread.sleep(1000);
            // Bypass the argument to FsDatasetImpl#finalizeBlock to verify that
            // the block is not removed, since the volume reference should not
            // be released at this point.
            data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
            return null;
          }
        }).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
{code}
2 thread can run synchronized method dn.data.removeVolumes and 
data.finalizeBlock concurrently because dn.data(mocked) and data is not the 
same object(hash 1903955157 and 1508483764).
{noformat}
2016-07-11 16:16:07,788 INFO  [Thread-0] datanode.TestDataNodeHotSwapVolumes 
(TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599)) 
- data hash:1903955157; dn.data hash:1508483764
2016-07-11 16:16:07,801 INFO  [Thread-157] datanode.DataNode 
(DataNode.java:reconfigurePropertyImpl(456)) - Reconfiguring 
dfs.datanode.data.dir to 
[DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
2016-07-11 16:16:07,810 WARN  [Thread-157] common.Util 
(Util.java:stringAsURI(56)) - Path 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
 should be specified as a URI in configuration files. Please update hdfs 
configuration.
2016-07-11 16:16:07,811 INFO  [Thread-157] datanode.DataNode 
(DataNode.java:removeVolumes(674)) - Deactivating volumes (clear failure=true): 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl 
(FsDatasetImpl.java:removeVolumes(459)) - Removing 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
 from FsDataset.
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl 
(FsDatasetImpl.java:removeVolumes(463)) - removeVolumes of object 
hash:1508483764
2016-07-11 16:16:07,836 INFO  [Thread-157] datanode.BlockScanner 
(BlockScanner.java:removeVolumeScanner(243)) - Removing scanner for volume 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
 (StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
2016-07-11 16:16:07,836 INFO  
[VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
 datanode.VolumeScanner (VolumeScanner.java:run(630)) - 
VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
 DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
2016-07-11 16:16:07,891 INFO  [IPC Server handler 7 on 63546] 
blockmanagement.DatanodeDescriptor 
(DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage 
[DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548 from 
DataNode127.0.0.1:63548
2016-07-11 16:16:07,908 INFO  [IPC Server handler 9 on 63546] 
blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(866)) 
- Adding new storage ID DS-f4df3404-9f02-470e-b202-75f5a4de29cb for DN 
127.0.0.1:63548
2016-07-11 16:16:08,845 INFO  [PacketResponder: 
BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001, 
type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl 
(FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of object 
hash:1903955157
2016-07-11 16:16:12,933 INFO  [DataXceiver for client  at /127.0.0.1:63574 
[Receiving block BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]] 
impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of 
object hash:1903955157
{noformat}
The UT ran passed.

* Online
When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with 
dn.data lock and wait referenceCount() = 0, but the other DataXceiver thread 
maybe blocked by dn.data lock and with referencing volume. This can be happened 
like HDFS-9874.

* Potential issue in project
There are so many usage of Mockito.spy in project, could some issues like that 
the UT passed but deadlock online ?

  was:
The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran 
successful, but deadlock like HDFS-9874 maybe happen online.
* UT: 
{code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
    dn.data = Mockito.spy(data);
    LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" + 
dn.data.hashCode());
    doAnswer(new Answer<Object>() {
          public Object answer(InvocationOnMock invocation)
              throws IOException, InterruptedException {
            Thread.sleep(1000);
            // Bypass the argument to FsDatasetImpl#finalizeBlock to verify that
            // the block is not removed, since the volume reference should not
            // be released at this point.
            data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
            return null;
          }
        }).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
{code}
2 thread can run synchronized method dn.data.removeVolumes and 
data.finalizeBlock concurrently because dn.data(mocked) and data is not the 
same object.
{noformat}
2016-07-11 16:16:07,788 INFO  [Thread-0] datanode.TestDataNodeHotSwapVolumes 
(TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599)) 
- data hash:1903955157; dn.data hash:1508483764
2016-07-11 16:16:07,801 INFO  [Thread-157] datanode.DataNode 
(DataNode.java:reconfigurePropertyImpl(456)) - Reconfiguring 
dfs.datanode.data.dir to 
[DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
2016-07-11 16:16:07,810 WARN  [Thread-157] common.Util 
(Util.java:stringAsURI(56)) - Path 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
 should be specified as a URI in configuration files. Please update hdfs 
configuration.
2016-07-11 16:16:07,811 INFO  [Thread-157] datanode.DataNode 
(DataNode.java:removeVolumes(674)) - Deactivating volumes (clear failure=true): 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl 
(FsDatasetImpl.java:removeVolumes(459)) - Removing 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
 from FsDataset.
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl 
(FsDatasetImpl.java:removeVolumes(463)) - removeVolumes of object 
hash:1508483764
2016-07-11 16:16:07,836 INFO  [Thread-157] datanode.BlockScanner 
(BlockScanner.java:removeVolumeScanner(243)) - Removing scanner for volume 
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
 (StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
2016-07-11 16:16:07,836 INFO  
[VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
 datanode.VolumeScanner (VolumeScanner.java:run(630)) - 
VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
 DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
2016-07-11 16:16:07,891 INFO  [IPC Server handler 7 on 63546] 
blockmanagement.DatanodeDescriptor 
(DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage 
[DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548 from 
DataNode127.0.0.1:63548
2016-07-11 16:16:07,908 INFO  [IPC Server handler 9 on 63546] 
blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(866)) 
- Adding new storage ID DS-f4df3404-9f02-470e-b202-75f5a4de29cb for DN 
127.0.0.1:63548
2016-07-11 16:16:08,845 INFO  [PacketResponder: 
BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001, 
type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl 
(FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of object 
hash:1903955157
2016-07-11 16:16:12,933 INFO  [DataXceiver for client  at /127.0.0.1:63574 
[Receiving block BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]] 
impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of 
object hash:1903955157
{noformat}
The UT ran passed.

* Online
When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with 
dn.data lock and wait referenceCount() = 0, but the other DataXceiver thread 
maybe blocked by dn.data lock and with referencing volume. This can be happened 
like HDFS-9874.

* Potential issue in project
There are so many usage of Mockito.spy in project, could some issues like that 
the UT passed but deadlock online ?


> Can not synchronized call method of object and Mockito.spy(object), So 
> UT:testRemoveVolumeBeingWritten passed but maybe deadlock online
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10605
>                 URL: https://issues.apache.org/jira/browse/HDFS-10605
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2
>            Reporter: ade
>              Labels: test
>
> The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran 
> successful, but deadlock like HDFS-9874 maybe happen online.
> * UT: 
> {code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
> final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
>     dn.data = Mockito.spy(data);
>     LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" + 
> dn.data.hashCode());
>     doAnswer(new Answer<Object>() {
>           public Object answer(InvocationOnMock invocation)
>               throws IOException, InterruptedException {
>             Thread.sleep(1000);
>             // Bypass the argument to FsDatasetImpl#finalizeBlock to verify 
> that
>             // the block is not removed, since the volume reference should not
>             // be released at this point.
>             data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
>             return null;
>           }
>         }).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
> {code}
> 2 thread can run synchronized method dn.data.removeVolumes and 
> data.finalizeBlock concurrently because dn.data(mocked) and data is not the 
> same object(hash 1903955157 and 1508483764).
> {noformat}
> 2016-07-11 16:16:07,788 INFO  [Thread-0] datanode.TestDataNodeHotSwapVolumes 
> (TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599))
>  - data hash:1903955157; dn.data hash:1508483764
> 2016-07-11 16:16:07,801 INFO  [Thread-157] datanode.DataNode 
> (DataNode.java:reconfigurePropertyImpl(456)) - Reconfiguring 
> dfs.datanode.data.dir to 
> [DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
> 2016-07-11 16:16:07,810 WARN  [Thread-157] common.Util 
> (Util.java:stringAsURI(56)) - Path 
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
>  should be specified as a URI in configuration files. Please update hdfs 
> configuration.
> 2016-07-11 16:16:07,811 INFO  [Thread-157] datanode.DataNode 
> (DataNode.java:removeVolumes(674)) - Deactivating volumes (clear 
> failure=true): 
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
> 2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl 
> (FsDatasetImpl.java:removeVolumes(459)) - Removing 
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
>  from FsDataset.
> 2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl 
> (FsDatasetImpl.java:removeVolumes(463)) - removeVolumes of object 
> hash:1508483764
> 2016-07-11 16:16:07,836 INFO  [Thread-157] datanode.BlockScanner 
> (BlockScanner.java:removeVolumeScanner(243)) - Removing scanner for volume 
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
>  (StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
> 2016-07-11 16:16:07,836 INFO  
> [VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
>  datanode.VolumeScanner (VolumeScanner.java:run(630)) - 
> VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
>  DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
> 2016-07-11 16:16:07,891 INFO  [IPC Server handler 7 on 63546] 
> blockmanagement.DatanodeDescriptor 
> (DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage 
> [DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548 from 
> DataNode127.0.0.1:63548
> 2016-07-11 16:16:07,908 INFO  [IPC Server handler 9 on 63546] 
> blockmanagement.DatanodeDescriptor 
> (DatanodeDescriptor.java:updateStorage(866)) - Adding new storage ID 
> DS-f4df3404-9f02-470e-b202-75f5a4de29cb for DN 127.0.0.1:63548
> 2016-07-11 16:16:08,845 INFO  [PacketResponder: 
> BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl 
> (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of object 
> hash:1903955157
> 2016-07-11 16:16:12,933 INFO  [DataXceiver for client  at /127.0.0.1:63574 
> [Receiving block BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]] 
> impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock 
> of object hash:1903955157
> {noformat}
> The UT ran passed.
> * Online
> When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with 
> dn.data lock and wait referenceCount() = 0, but the other DataXceiver thread 
> maybe blocked by dn.data lock and with referencing volume. This can be 
> happened like HDFS-9874.
> * Potential issue in project
> There are so many usage of Mockito.spy in project, could some issues like 
> that the UT passed but deadlock online ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to