[ https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manoj Govindassamy updated HDFS-10780: -------------------------------------- Status: Patch Available (was: Open) > Block replication not proceeding after pipeline recovery -- > TestDataNodeHotSwapVolumes fails > -------------------------------------------------------------------------------------------- > > Key: HDFS-10780 > URL: https://issues.apache.org/jira/browse/HDFS-10780 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.0.0-alpha1 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Attachments: HDFS-10780.001.patch > > > TestDataNodeHotSwapVolumes occasionally fails in the unit test > testRemoveVolumeBeingWrittenForDatanode. Data write pipeline can have issues > as there could be timeouts, data node not reachable etc, and in this test > case it was more of induced one as one of the volumes in a datanode is > removed while block write is in progress. Digging further in the logs, when > the problem happens in the write pipeline, the error recovery is not > happening as expected leading to block replication never catching up. > Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.serv > testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes) > Time elapsed: 44.354 se > java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 > replicas > Results : > Tests in error: > > TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714 > ยป Timeout > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 > Following exceptions are not expected in this test run > {noformat} > 614 2016-08-10 12:30:11,269 [DataXceiver for client > DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block > BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG > datanode.Da taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number > of active connections is: 2 > 615 java.lang.IllegalMonitorStateException > 616 at java.lang.Object.wait(Native Method) > 617 at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280) > 618 at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517) > 619 at > org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832) > 620 at > org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798) > {noformat} > {noformat} > 720 2016-08-10 12:30:11,287 [DataNode: > [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/, > [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec > t/hadoop-hdfs/target/test/data/dfs/data/data2/]] heartbeating to > localhost/127.0.0.1:58788] ERROR datanode.DataNode > (BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool > BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid > 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788 > 721 java.lang.NullPointerException > 722 at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841) > 723 at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336) > 724 at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624) > 725 at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766) > 726 at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org