[jira] [Commented] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"
[ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081972#comment-14081972 ] Gordon Wang commented on HDFS-6804: --- Some thoughts about how to fix this issue. In my mind, there are 2 ways to fix this. * Option1 When the block is opened for appending, check if there are some DataTransfer threads which are transferring block to other DNs. Stop these DataTransferring threads. We can stop these threads because the generation timestamp of the block is increased because it is opened for appending. So, the DataTransfer threads are sending outdated blocks. * Option2 In DataTransfer thread, if the replica of the block is finalized, the DataTransfer thread can read the last data chunk checksum into the memory, record the replica length in memory too. Then, when sending the last data chunk, use the checksum in memory instead of reading it from the disk. This is similar to what we deal with a RBW replica in DataTransfer. For Option1, it is hard to stop the DataTransfer thread unless we add some code in DataNode to manage DataTransfer threads. For Option2, we should lock FsDatasetImpl object in DataNode when reading the last chunk checksum from disk. Otherwise, the last block might be overwritten. But reading from the disk needs time, putting the expensive disk IO operations during locking FsDatasetImpl might cause some performance downgrade in DataNodes. Any opinions or comments are welcome! Thanks. > race condition between transferring block and appending block causes > "Unexpected checksum mismatch exception" > -- > > Key: HDFS-6804 > URL: https://issues.apache.org/jira/browse/HDFS-6804 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0 >Reporter: Gordon Wang > > We found some error log in the datanode. like this > {noformat} > 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Ex > ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 > java.io.IOException: Terminating due to a checksum error.java.io.IOException: > Unexpected checksum mismatch while writing > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from > /192.168.2.101:39495 > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:744) > {noformat} > While on the source datanode, the log says the block is transmitted. > {noformat} > 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Da > taTransfer: Transmitted > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 > _9248 (numBytes=16188152) to /192.168.2.103:50010 > {noformat} > When the destination datanode gets the checksum mismatch, it reports bad > block to NameNode and NameNode marks the replica on the source datanode as > corrupt. But actually, the replica on the source datanode is valid. Because > the replica can pass the checksum verification. > In all, the replica on the source data is wrongly marked as corrupted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6773) MiniDFSCluster can run dramatically faster
[ https://issues.apache.org/jira/browse/HDFS-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081959#comment-14081959 ] Stephen Chu commented on HDFS-6773: --- Thanks, Daryn. That sounds like a good approach. I see 2 tests that {{EditLogFileOutputStream.setShouldSkipFsyncForTesting(false);}}: TestFsDatasetCache.java TestCacheDirectives.java Checked with Andrew and Colin, and we think that fsync is probably not a requirement for the caching tests because the unit tests aren't aimed to be run with a power-cycle in between. Will look into it more, as well as go through the rest of HDFS tests to see if any need fsync. > MiniDFSCluster can run dramatically faster > -- > > Key: HDFS-6773 > URL: https://issues.apache.org/jira/browse/HDFS-6773 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Stephen Chu > > The mini cluster is unnecessarily running with durable edit logs. The > following change cut runtime of a single test from ~30s to ~10s. > {code}EditLogFileOutputStream.setShouldSkipFsyncForTesting(true);{code} > The mini cluster should default to this behavior after identifying the few > edit log tests that probably depend on durable logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"
[ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081955#comment-14081955 ] Gordon Wang commented on HDFS-6804: --- After checking the code of Datanode block transferring, I found some race condition during transferring the block to the other datanode. And the race condition causes the source datanode transfers the wrong checksum of the last chunk in replica. Here is the root cause. # Datanode DN1 receives transfer block command from NameNode, say, the command needs DN1 to transfer block B1 to DataNode DN2. # DN1 creates a new DataTransfer thread, which is responsible for transferring B1 to DN2. # When DataTransfer thread is created, the replica of B1 is in Finalized state. Then, DataTransfer reads replica content and checksum directly from disk, sends them to DN2. # During DataTransfer is sending data to DN2. The block B1 is opened for appending. If the last data chunk of B1 is not full, the last checksum will be overwritten by the BlockReceiver thread. # In DataTransfer thread, it records the block length as the length before appending. Then, here comes the problem. When DataTransfer thread sends the last data chunk to ND2, it reads the checksum of the last chunk from the disk and sends the checksum too. But at this time, the last checksum is changed, because some more data is appended in the last data chunk. # When DN2 receives the last data chunk and checksum, it will throw the checksum mismatch exception. The reproduce steps Prerequisites # change the code in DataNode.java, sleep a while before sending the block. Make these change in DataTransfer.run method. {code} //hack code here try { LOG.warn("sleep 10 seconds before transfer the block:" + b); Thread.sleep(1000L * 10); }catch (InterruptedException ie) { LOG.error("exception caught."); } //hack code end // send data & checksum blockSender.sendBlock(out, unbufOut, null); {code} Steps # Create a HDFS cluster which has 1 NameNode NN and 1 DataNode DN1. # Create a file F1 whose expected replica factor is 3. Writes some data to the file and close it. # start a new DataNode DN2 to join the cluster. # grep the log of DN1, when the DataTransfer thread is sleeping, open F1 to appends some data, hflush the data to the DN1. Then, you can find that DN2 throws checksum mismatch exception when receiving the last block of file F1. > race condition between transferring block and appending block causes > "Unexpected checksum mismatch exception" > -- > > Key: HDFS-6804 > URL: https://issues.apache.org/jira/browse/HDFS-6804 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0 >Reporter: Gordon Wang > > We found some error log in the datanode. like this > {noformat} > 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Ex > ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 > java.io.IOException: Terminating due to a checksum error.java.io.IOException: > Unexpected checksum mismatch while writing > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from > /192.168.2.101:39495 > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:744) > {noformat} > While on the source datanode, the log says the block is transmitted. > {noformat} > 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Da > taTransfer: Transmitted > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 > _9248 (numBytes=16188152) to /192.168.2.103:50010 > {noformat} > When the destination datanode gets the checksum mismatch, it reports bad > block to NameNode and NameNode marks the replica on the source datanode as > corrupt. But actually, the replica on the source datanode is valid. Because > the replica can pass the checksum verification. > In all, the replica on the source data is wrongly marked as corrupted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"
Gordon Wang created HDFS-6804: - Summary: race condition between transferring block and appending block causes "Unexpected checksum mismatch exception" Key: HDFS-6804 URL: https://issues.apache.org/jira/browse/HDFS-6804 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Gordon Wang We found some error log in the datanode. like this {noformat} 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 java.io.IOException: Terminating due to a checksum error.java.io.IOException: Unexpected checksum mismatch while writing BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from /192.168.2.101:39495 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744) {noformat} While on the source datanode, the log says the block is transmitted. {noformat} 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Da taTransfer: Transmitted BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 _9248 (numBytes=16188152) to /192.168.2.103:50010 {noformat} When the destination datanode gets the checksum mismatch, it reports bad block to NameNode and NameNode marks the replica on the source datanode as corrupt. But actually, the replica on the source datanode is valid. Because the replica can pass the checksum verification. In all, the replica on the source data is wrongly marked as corrupted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation
[ https://issues.apache.org/jira/browse/HDFS-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081921#comment-14081921 ] Akira AJISAKA commented on HDFS-6802: - Attached a patch to # add {{@Test}} annotation # fix {{testWrappedFailoverProxyProvider()}} failure by setting {{SecurityUtil}} not to use IP address for token service. > Some tests in TestDFSClientFailover are missing @Test annotation > > > Key: HDFS-6802 > URL: https://issues.apache.org/jira/browse/HDFS-6802 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.5.0 >Reporter: Akira AJISAKA > Labels: newbie > Attachments: HDFS-6802.patch > > > HDFS-6334 added new tests in TestDFSClientFailover but they are not executed > by Junit framework because they don't have {{@Test}} annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation
[ https://issues.apache.org/jira/browse/HDFS-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6802: Assignee: Akira AJISAKA Target Version/s: 2.5.0 Status: Patch Available (was: Open) > Some tests in TestDFSClientFailover are missing @Test annotation > > > Key: HDFS-6802 > URL: https://issues.apache.org/jira/browse/HDFS-6802 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.5.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Labels: newbie > Attachments: HDFS-6802.patch > > > HDFS-6334 added new tests in TestDFSClientFailover but they are not executed > by Junit framework because they don't have {{@Test}} annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation
[ https://issues.apache.org/jira/browse/HDFS-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6802: Attachment: HDFS-6802.patch > Some tests in TestDFSClientFailover are missing @Test annotation > > > Key: HDFS-6802 > URL: https://issues.apache.org/jira/browse/HDFS-6802 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.5.0 >Reporter: Akira AJISAKA > Labels: newbie > Attachments: HDFS-6802.patch > > > HDFS-6334 added new tests in TestDFSClientFailover but they are not executed > by Junit framework because they don't have {{@Test}} annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
stack created HDFS-6803: --- Summary: Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: DocumentingDFSClientDFSInputStream (1).pdf Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
[ https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-6803: Attachment: DocumentingDFSClientDFSInputStream (1).pdf First cut. Please review and advise if we overstep. Thanks. > Documenting DFSClient#DFSInputStream expectations reading and preading in > concurrent context > > > Key: HDFS-6803 > URL: https://issues.apache.org/jira/browse/HDFS-6803 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 2.4.1 >Reporter: stack > Attachments: DocumentingDFSClientDFSInputStream (1).pdf > > > Reviews of the patch posted the parent task suggest that we be more explicit > about how DFSIS is expected to behave when being read by contending threads. > It is also suggested that presumptions made internally be made explicit > documenting expectations. > Before we put up a patch we've made a document of assertions we'd like to > make into tenets of DFSInputSteam. If agreement, we'll attach to this issue > a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6798) Add test case for incorrect data node condition during balancing
[ https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081909#comment-14081909 ] Hadoop QA commented on HDFS-6798: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659010/HDFS-6798.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7522//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7522//console This message is automatically generated. > Add test case for incorrect data node condition during balancing > > > Key: HDFS-6798 > URL: https://issues.apache.org/jira/browse/HDFS-6798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6798.patch > > > The Balancer makes a check to see if a block's location is a known data node. > But the variable it uses to check is wrong. This issue was fixed in HDFS-6364. > There was no way to easily unit test it at that time. Since HDFS-6441 enables > one to simulate this case, it was decided to add the unit test once HDFS-6441 > is resolved. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments
[ https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081902#comment-14081902 ] Hudson commented on HDFS-3482: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5993 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5993/]) HDFS-3482. Update CHANGES.txt. (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615019) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified > without arguments > > > Key: HDFS-3482 > URL: https://issues.apache.org/jira/browse/HDFS-3482 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.0-alpha >Reporter: Stephen Chu >Assignee: madhukara phatak >Priority: Minor > Labels: newbie > Fix For: 3.0.0, 2.6.0 > > Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, > HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch > > > When running the hdfs balancer with an option but no argument, we run into an > ArrayIndexOutOfBoundsException. It's preferable to print the usage. > {noformat} > bash-3.2$ hdfs balancer -threshold > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > bash-3.2$ hdfs balancer -policy > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6685) Balancer should preserve storage type of replicas
[ https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081900#comment-14081900 ] Hudson commented on HDFS-6685: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5993 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5993/]) HDFS-6685. Balancer should preserve storage type of replicas. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615015) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/StorageType.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/BalancingPolicy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlocksWithLocations.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/EnumCounters.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/EnumDoubles.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java > Balancer should preserve storage type of replicas > - > > Key: HDFS-6685 > URL: https://issues.apache.org/jira/browse/HDFS-6685 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.6.0 > > Attachments: h6685_20140728.patch, h6685_20140729.patch, > h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch > > > When Balancer moves replicas to balance the cluster, it should always move > replicas from a storage with any type to another storage with the same type, > i.e. it preserves storage type of replicas. It does not make sense to move > replicas to a different storage type. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6797) DataNode logs wrong layoutversion during upgrade
[ https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081901#comment-14081901 ] Hudson commented on HDFS-6797: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5993 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5993/]) HDFS-6797. DataNode logs wrong layoutversion during upgrade. (Contributed by Benoy Antony) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615017) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java > DataNode logs wrong layoutversion during upgrade > > > Key: HDFS-6797 > URL: https://issues.apache.org/jira/browse/HDFS-6797 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Fix For: 3.0.0, 2.6.0 > > Attachments: HDFS-6797.patch > > > Before upgrade, data node version was -55. The new data node version remained > at -55. During upgrade we got he following messages: > {code} > 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: > Data-node version: -55 and name-node layout version: -56 > ... > ... > ... > ... > 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrading block pool storage directory > /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239. >old LV = -55; old CTime = 1402508907789. >new LV = -56; new CTime = 1405453914270 > 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at > /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete > 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Setting up storage: > nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326 > 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae > {code} > after upgrade completing, restart of DN still shows message regarding version > difference: > {code} > INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and > name-node layout version: -56 > {code} > This causes confusion to the operators as if upgrade did not succeed since > data node's layout version is not updated to the "new LV" value > Actually name node's layout version is displayed as the "new LV" value. > Since the data node and name node layout versions are separate now, the new > data node layout version should be shown as the “new LV”. > Thanks to [~ehf] who found and reported this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5185) DN fails to startup if one of the data dir is full
[ https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-5185: Attachment: HDFS-5185-002.patch Attaching the updated patch. after recent changes {{checkDiskError()}} will trigger one periodic thread which will check for disk error asynchronously. But this issue requires synchronous check for errors before initializing block pools. Accordingly, checking for errors synchronously before initializing block pools to exclude failed disks to avoid startup failures. Please review. > DN fails to startup if one of the data dir is full > -- > > Key: HDFS-5185 > URL: https://issues.apache.org/jira/browse/HDFS-5185 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Attachments: HDFS-5185-002.patch, HDFS-5185.patch > > > DataNode fails to startup if one of the data dirs configured is out of space. > fails with following exception > {noformat}2013-09-11 17:48:43,680 FATAL > org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for > block pool Block pool (storage id > DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110 > java.io.IOException: Mkdirs failed to create > /opt/nish/data/current/BP-123456-1234567/tmp > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:105) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660) > at java.lang.Thread.run(Thread.java:662) > {noformat} > It should continue to start-up with other data dirs available. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation
Akira AJISAKA created HDFS-6802: --- Summary: Some tests in TestDFSClientFailover are missing @Test annotation Key: HDFS-6802 URL: https://issues.apache.org/jira/browse/HDFS-6802 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.5.0 Reporter: Akira AJISAKA HDFS-6334 added new tests in TestDFSClientFailover but they are not executed by Junit framework because they don't have {{@Test}} annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5185) DN fails to startup if one of the data dir is full
[ https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081885#comment-14081885 ] Hadoop QA commented on HDFS-5185: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627603/HDFS-5185.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7521//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7521//console This message is automatically generated. > DN fails to startup if one of the data dir is full > -- > > Key: HDFS-5185 > URL: https://issues.apache.org/jira/browse/HDFS-5185 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Attachments: HDFS-5185.patch > > > DataNode fails to startup if one of the data dirs configured is out of space. > fails with following exception > {noformat}2013-09-11 17:48:43,680 FATAL > org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for > block pool Block pool (storage id > DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110 > java.io.IOException: Mkdirs failed to create > /opt/nish/data/current/BP-123456-1234567/tmp > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:105) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660) > at java.lang.Thread.run(Thread.java:662) > {noformat} > It should continue to start-up with other data dirs available. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081882#comment-14081882 ] Hadoop QA commented on HDFS-6791: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659022/HDFS-6791.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7520//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7520//console This message is automatically generated. > A block could remain under replicated if all of its replicas are on > decommissioned nodes > > > Key: HDFS-6791 > URL: https://issues.apache.org/jira/browse/HDFS-6791 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-6791.patch > > > Here is the scenario. > 1. Normally before NN transitions a DN to decommissioned state, enough > replicas have been copied to other "in service" DNs. However, in some rare > situations, the cluster got into a state where a DN is in decommissioned > state and a block's only replica is on that DN. In such state, the number of > replication reported by fsck is 1; the block just stays in under replicated > state; applications can still read the data, given decommissioned node can > served read traffic. > This can happen in some error situations such DN failure or NN failover. For > example > a) a block's only replica is node A temporarily. > b) Start decommission process on node A. > c) When node A is in "decommission-in-progress" state, node A crashed. NN > will mark node A as dead. > d) After node A rejoins the cluster, NN will mark node A as decommissioned. > 2. In theory, NN should take care of under replicated blocks. But it doesn't > for this special case where the only replica is on decommissioned node. That > is because NN has the policy of "decommissioned node can't be picked the > source node for replication". > {noformat} > BlockManager.java > chooseSourceDatanode > // never use already decommissioned nodes > if(node.isDecommissioned()) > continue; > {noformat} > 3. Given NN marks the node as decommissioned, admins will shutdown the > datanode. Under replicated blocks turn into missing blocks. > 4. The workaround is to recommission the node so that NN can start the > replication from the node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4901) Site Scripting and Phishing Through Frames in browseDirectory.jsp
[ https://issues.apache.org/jira/browse/HDFS-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081864#comment-14081864 ] Hadoop QA commented on HDFS-4901: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627367/HDFS-4901_branch-1.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7523//console This message is automatically generated. > Site Scripting and Phishing Through Frames in browseDirectory.jsp > - > > Key: HDFS-4901 > URL: https://issues.apache.org/jira/browse/HDFS-4901 > Project: Hadoop HDFS > Issue Type: Bug > Components: security, webhdfs >Affects Versions: 1.2.1 >Reporter: Jeffrey E Rodriguez >Assignee: Vivek Ganesan >Priority: Blocker > Attachments: HDFS-4901.patch, HDFS-4901.patch.1, > HDFS-4901_branch-1.2.patch > > Original Estimate: 24h > Time Spent: 24h > Remaining Estimate: 0h > > It is possible to steal or manipulate customer session and cookies, which > might be used to impersonate a legitimate user, > allowing the hacker to view or alter user records, and to perform > transactions as that user. > e.g. > GET /browseDirectory.jsp? dir=%2Fhadoop'"/>alert(759) > &namenodeInfoPort=50070 > Also; > Phishing Through Frames > Try: > GET /browseDirectory.jsp? > dir=%2Fhadoop%27%22%3E%3Ciframe+src%3Dhttp%3A%2F%2Fdemo.testfire.net%2Fphishing.html%3E > &namenodeInfoPort=50070 HTTP/1.1 > Cookie: JSESSIONID=qd9i8tuccuam1cme71swr9nfi > Accept-Language: en-US > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081859#comment-14081859 ] Hadoop QA commented on HDFS-6783: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659005/HDFS-6783.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7518//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7518//console This message is automatically generated. > Fix HDFS CacheReplicationMonitor rescan logic > - > > Key: HDFS-6783 > URL: https://issues.apache.org/jira/browse/HDFS-6783 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 3.0.0 >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, > HDFS-6783.003.patch > > > In monitor thread, needsRescan is set to false before real scan starts, so > for {{waitForRescanIfNeeded}} will return for the first condition: > {code} > if (!needsRescan) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
[ https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081849#comment-14081849 ] Akira AJISAKA commented on HDFS-6789: - I applied the patch and confirmed the tests passed in two environments: * Oracle JDK7u40 in Mac OS X 10.9 * Oracle JDK7u65 in CentOS 6.4 > TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and > TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7 > > > Key: HDFS-6789 > URL: https://issues.apache.org/jira/browse/HDFS-6789 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.5.0 > Environment: jdk7 >Reporter: Rushabh S Shah >Assignee: Akira AJISAKA > Attachments: HDFS-6789.patch > > > The following two tests are failing on jdk7. > org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI > org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI > On jdk6 it just skips the tests . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock
[ https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081847#comment-14081847 ] Yongjun Zhang commented on HDFS-6788: - Hi Andrew and Arpit, Thanks a lot for reviewing and addressing my question, and even taking care of triggering the build! > Improve synchronization in BPOfferService with read write lock > -- > > Key: HDFS-6788 > URL: https://issues.apache.org/jira/browse/HDFS-6788 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch > > > Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block > at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), > though they are just reading the same blockpool id. This is unnecessary > overhead and may cause performance hit when many threads compete. Filing this > jira to replace synchronized method with read write lock > (ReentrantReadWriteLock). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
[ https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6789: Target Version/s: 2.5.0 Status: Patch Available (was: Open) > TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and > TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7 > > > Key: HDFS-6789 > URL: https://issues.apache.org/jira/browse/HDFS-6789 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.5.0 > Environment: jdk7 >Reporter: Rushabh S Shah >Assignee: Akira AJISAKA > Attachments: HDFS-6789.patch > > > The following two tests are failing on jdk7. > org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI > org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI > On jdk6 it just skips the tests . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
[ https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6789: Component/s: test > TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and > TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7 > > > Key: HDFS-6789 > URL: https://issues.apache.org/jira/browse/HDFS-6789 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.5.0 > Environment: jdk7 >Reporter: Rushabh S Shah >Assignee: Akira AJISAKA > Attachments: HDFS-6789.patch > > > The following two tests are failing on jdk7. > org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI > org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI > On jdk6 it just skips the tests . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
[ https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6789: Attachment: HDFS-6789.patch Attaching a patch to spy NameSpace after initializing FileSystem. > TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and > TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7 > > > Key: HDFS-6789 > URL: https://issues.apache.org/jira/browse/HDFS-6789 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: jdk7 >Reporter: Rushabh S Shah >Assignee: Akira AJISAKA > Attachments: HDFS-6789.patch > > > The following two tests are failing on jdk7. > org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI > org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI > On jdk6 it just skips the tests . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6787) Remove duplicate code in FSDirectory#unprotectedConcat
[ https://issues.apache.org/jira/browse/HDFS-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081838#comment-14081838 ] Yi Liu commented on HDFS-6787: -- Thank Uma, yes, I have debugged that inodes are cleaned correctly. > Remove duplicate code in FSDirectory#unprotectedConcat > -- > > Key: HDFS-6787 > URL: https://issues.apache.org/jira/browse/HDFS-6787 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6787.001.patch > > > {code} > // update inodeMap > removeFromInodeMap(Arrays.asList(allSrcInodes)); > {code} > this snippet of code is duplicate, since we already have the logic above it: > {code} > for(INodeFile nodeToRemove: allSrcInodes) { > if(nodeToRemove == null) continue; > > nodeToRemove.setBlocks(null); > trgParent.removeChild(nodeToRemove, trgLatestSnapshot); > inodeMap.remove(nodeToRemove); > count++; > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible
[ https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6794: -- Component/s: namenode Priority: Minor (was: Major) Hadoop Flags: Reviewed +1 patch looks good. > Update BlockManager methods to use DatanodeStorageInfo where possible > - > > Key: HDFS-6794 > URL: https://issues.apache.org/jira/browse/HDFS-6794 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Minor > Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, > HDFS-6794.03.patch, HDFS-6794.03.patch > > > Post HDFS-2832, BlockManager methods can be updated to accept > DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
[ https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081837#comment-14081837 ] Akira AJISAKA commented on HDFS-6789: - The tests fail because {code} FileSystem fs = HATestUtil.configureFailoverFs(cluster, conf); {code} calls {{NameNode.getAddress(nameNodeUri)}} to get {{InetSocketAddress}} for initializing {{ProxyAndInfo}} after HDFS-6507. Since the tests are to ensure {{FileSystem}} and {{FileContext}} does not resolve the logical hostname, I think it's fine to spy NameService after initializing {{FileSystem}}. > TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and > TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7 > > > Key: HDFS-6789 > URL: https://issues.apache.org/jira/browse/HDFS-6789 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: jdk7 >Reporter: Rushabh S Shah >Assignee: Akira AJISAKA > > The following two tests are failing on jdk7. > org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI > org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI > On jdk6 it just skips the tests . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6801) Archival Storage: Add a new data migration tool
Tsz Wo Nicholas Sze created HDFS-6801: - Summary: Archival Storage: Add a new data migration tool Key: HDFS-6801 URL: https://issues.apache.org/jira/browse/HDFS-6801 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The tool is similar to Balancer. It periodic scans the blocks in HDFS and uses path and/or other meta data (e.g. mtime) to determine if a block should be cooled down (i.e. hot => warm, or warm => cold) or warmed up (i.e. cold => warm, or warm => hot). In contrast to Balancer, the migration tool always move replicas to a different storage type. Similar to Balancer, the replicas are moved in a way that the number of racks the block does not decrease. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible
[ https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6794: Attachment: HDFS-6794.03.patch I missed that from the last Jenkins run, thanks Nicholas. Updated patch attached. > Update BlockManager methods to use DatanodeStorageInfo where possible > - > > Key: HDFS-6794 > URL: https://issues.apache.org/jira/browse/HDFS-6794 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, > HDFS-6794.03.patch, HDFS-6794.03.patch > > > Post HDFS-2832, BlockManager methods can be updated to accept > DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6784) Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls setNeedsRescan multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081811#comment-14081811 ] Yi Liu commented on HDFS-6784: -- Thanks [~cmccabe], let's wait to see whether we need to handle it separately after HDFS-6783. > Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls > setNeedsRescan multiple times. > --- > > Key: HDFS-6784 > URL: https://issues.apache.org/jira/browse/HDFS-6784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Affects Versions: 3.0.0 >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6784.001.patch > > > In HDFS CacheReplicationMonitor, rescan is expensive. Sometimes, > {{setNeedsRescan}} is called multiple times, for example, in > FSNamesystem#modifyCacheDirective, there are 3 times. In monitor thread of > CacheReplicationMonitor, if it checks {{needsRescan}} is true, rescan will > happen, but {{needsRescan}} is set to false before real scan. Meanwhile, the > 2nd or 3rd time {{setNeedsResacn}} may set {{needsRescan}} to true. So after > the scan finish, in next loop, a new rescan will be triggered, that's not > necessary at all and inefficient for rescan twice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included
[ https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6777: --- Status: Patch Available (was: Open) > Supporting consistent edit log reads when in-progress edit log segments are > included > > > Key: HDFS-6777 > URL: https://issues.apache.org/jira/browse/HDFS-6777 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: qjm >Reporter: James Thomas >Assignee: James Thomas > Attachments: 6777-design.pdf, HDFS-6777.patch > > > For inotify, we want to be able to read transactions from in-progress edit > log segments so we can serve transactions to listeners soon after they are > committed. This JIRA works toward ensuring that we do not send unsync'ed > transactions back to the client by 1) discarding in-progress segments if we > have a finalized segment starting at the same transaction ID and 2) if there > are no finalized segments at the same transaction ID, using only the > in-progress segments with the largest seen lastWriterEpoch. See the design > document for more background and details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
[ https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned HDFS-6789: --- Assignee: Akira AJISAKA > TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and > TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7 > > > Key: HDFS-6789 > URL: https://issues.apache.org/jira/browse/HDFS-6789 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 > Environment: jdk7 >Reporter: Rushabh S Shah >Assignee: Akira AJISAKA > > The following two tests are failing on jdk7. > org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI > org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI > On jdk6 it just skips the tests . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081803#comment-14081803 ] Yi Liu commented on HDFS-6783: -- Thanks [~cmccabe], not need apologize, our goal is to resolve the issue together :) The new version patch can resolve the issue in {{waitForRescanIfNeeded}}, but I didn't see it can solve the issue in HDFS-6784, look at following steps: {code} init state completedScanCount = 0; curScanCount = -1; neededScanCount = 1; setNeedsRecan-- completedScanCount = 0; curScanCount = -1; neededScanCount = 1; in while loop-- completedScanCount = 0; curScanCount = 1; neededScanCount = 1; setNeedsRescan- completedScanCount = 0; curScanCount = 1; neededScanCount = 2; rescan- after rescan completedScanCount = 1; curScanCount = -1; neededScanCount = 2;<--- completedScanCount < neededScanCount, still there will be another unnecessary rescan. {code} > Fix HDFS CacheReplicationMonitor rescan logic > - > > Key: HDFS-6783 > URL: https://issues.apache.org/jira/browse/HDFS-6783 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 3.0.0 >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, > HDFS-6783.003.patch > > > In monitor thread, needsRescan is set to false before real scan starts, so > for {{waitForRescanIfNeeded}} will return for the first condition: > {code} > if (!needsRescan) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6757) Simplify lease manager with INodeID
[ https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081799#comment-14081799 ] Colin Patrick McCabe commented on HDFS-6757: [~daryn]: yeah, we could provide both an inode ID and a path in the close op. Maybe that's the best option here... > Simplify lease manager with INodeID > --- > > Key: HDFS-6757 > URL: https://issues.apache.org/jira/browse/HDFS-6757 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch, > HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch > > > Currently the lease manager records leases based on path instead of inode > ids. Therefore, the lease manager needs to carefully keep track of the path > of active leases during renames and deletes. This can be a non-trivial task. > This jira proposes to simplify the logic by tracking leases using inodeids > instead of paths. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible
[ https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081798#comment-14081798 ] Tsz Wo Nicholas Sze commented on HDFS-6794: --- For the findbugs warnings, we should keep using dn instead of node in the code below. {code} if (node == null) { - throw new IOException("Cannot mark " + b - + " as corrupt because datanode " + dn + " (" + dn.getDatanodeUuid() + throw new IOException("Cannot mark " + blk + + " as corrupt because datanode " + node + " (" + node.getDatanodeUuid() + ") does not exist"); } {code} > Update BlockManager methods to use DatanodeStorageInfo where possible > - > > Key: HDFS-6794 > URL: https://issues.apache.org/jira/browse/HDFS-6794 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, > HDFS-6794.03.patch > > > Post HDFS-2832, BlockManager methods can be updated to accept > DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6685) Balancer should preserve storage type of replicas
[ https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6685: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Arpit for reviewing the patches. I have committed this. > Balancer should preserve storage type of replicas > - > > Key: HDFS-6685 > URL: https://issues.apache.org/jira/browse/HDFS-6685 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.6.0 > > Attachments: h6685_20140728.patch, h6685_20140729.patch, > h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch > > > When Balancer moves replicas to balance the cluster, it should always move > replicas from a storage with any type to another storage with the same type, > i.e. it preserves storage type of replicas. It does not make sense to move > replicas to a different storage type. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6685) Balancer should preserve storage type of replicas
[ https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081793#comment-14081793 ] Tsz Wo Nicholas Sze edited comment on HDFS-6685 at 8/1/14 1:23 AM: --- Thanks Arpit and Vinay for reviewing the patches. I have committed this. was (Author: szetszwo): Thanks Arpit for reviewing the patches. I have committed this. > Balancer should preserve storage type of replicas > - > > Key: HDFS-6685 > URL: https://issues.apache.org/jira/browse/HDFS-6685 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.6.0 > > Attachments: h6685_20140728.patch, h6685_20140729.patch, > h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch > > > When Balancer moves replicas to balance the cluster, it should always move > replicas from a storage with any type to another storage with the same type, > i.e. it preserves storage type of replicas. It does not make sense to move > replicas to a different storage type. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6796) Improving the argument check during balancer command line parsing
[ https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6796: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) +1 patch looks good. (You are right that we should use "<" but not "==". Thanks.) > Improving the argument check during balancer command line parsing > - > > Key: HDFS-6796 > URL: https://issues.apache.org/jira/browse/HDFS-6796 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony >Priority: Minor > Attachments: HDFS-6796.patch, HDFS-6796.patch > > > Currently balancer CLI parser simply checks if the total number of arguments > is greater than 2 inside the loop. Since the check does not include any loop > variables, it is not a proper check when there more than 2 arguments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6780) Batch the encryption zones listing API
[ https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6780: -- Attachment: hdfs-6780.002.patch Fix a compile error > Batch the encryption zones listing API > -- > > Key: HDFS-6780 > URL: https://issues.apache.org/jira/browse/HDFS-6780 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-6780.001.patch, hdfs-6780.002.patch > > > To future-proof the API, it'd be better if the listEZs API returned a > RemoteIterator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock
[ https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081789#comment-14081789 ] Andrew Wang commented on HDFS-6788: --- It looks like precommit isn't picking this up for some reason. I manually triggered a build. On Thu, Jul 31, 2014 at 5:41 PM, Arpit Agarwal (JIRA) > Improve synchronization in BPOfferService with read write lock > -- > > Key: HDFS-6788 > URL: https://issues.apache.org/jira/browse/HDFS-6788 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch > > > Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block > at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), > though they are just reading the same blockpool id. This is unnecessary > overhead and may cause performance hit when many threads compete. Filing this > jira to replace synchronized method with read write lock > (ReentrantReadWriteLock). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6757) Simplify lease manager with INodeID
[ https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081786#comment-14081786 ] Hadoop QA commented on HDFS-6757: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658724/HDFS-6757.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7517//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7517//console This message is automatically generated. > Simplify lease manager with INodeID > --- > > Key: HDFS-6757 > URL: https://issues.apache.org/jira/browse/HDFS-6757 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch, > HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch > > > Currently the lease manager records leases based on path instead of inode > ids. Therefore, the lease manager needs to carefully keep track of the path > of active leases during renames and deletes. This can be a non-trivial task. > This jira proposes to simplify the logic by tracking leases using inodeids > instead of paths. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6797) DataNode logs wrong layoutversion during upgrade
[ https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6797: Resolution: Fixed Fix Version/s: 2.6.0 3.0.0 Target Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks for the contribution [~benoyantony]. > DataNode logs wrong layoutversion during upgrade > > > Key: HDFS-6797 > URL: https://issues.apache.org/jira/browse/HDFS-6797 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Fix For: 3.0.0, 2.6.0 > > Attachments: HDFS-6797.patch > > > Before upgrade, data node version was -55. The new data node version remained > at -55. During upgrade we got he following messages: > {code} > 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: > Data-node version: -55 and name-node layout version: -56 > ... > ... > ... > ... > 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrading block pool storage directory > /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239. >old LV = -55; old CTime = 1402508907789. >new LV = -56; new CTime = 1405453914270 > 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at > /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete > 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Setting up storage: > nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326 > 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae > {code} > after upgrade completing, restart of DN still shows message regarding version > difference: > {code} > INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and > name-node layout version: -56 > {code} > This causes confusion to the operators as if upgrade did not succeed since > data node's layout version is not updated to the "new LV" value > Actually name node's layout version is displayed as the "new LV" value. > Since the data node and name node layout versions are separate now, the new > data node layout version should be shown as the “new LV”. > Thanks to [~ehf] who found and reported this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6797) DataNode logs wrong layoutversion during upgrade
[ https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6797: Summary: DataNode logs wrong layoutversion during upgrade (was: Misleading LayoutVersion information during data node upgrade) > DataNode logs wrong layoutversion during upgrade > > > Key: HDFS-6797 > URL: https://issues.apache.org/jira/browse/HDFS-6797 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6797.patch > > > Before upgrade, data node version was -55. The new data node version remained > at -55. During upgrade we got he following messages: > {code} > 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: > Data-node version: -55 and name-node layout version: -56 > ... > ... > ... > ... > 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrading block pool storage directory > /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239. >old LV = -55; old CTime = 1402508907789. >new LV = -56; new CTime = 1405453914270 > 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at > /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete > 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Setting up storage: > nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326 > 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae > {code} > after upgrade completing, restart of DN still shows message regarding version > difference: > {code} > INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and > name-node layout version: -56 > {code} > This causes confusion to the operators as if upgrade did not succeed since > data node's layout version is not updated to the "new LV" value > Actually name node's layout version is displayed as the "new LV" value. > Since the data node and name node layout versions are separate now, the new > data node layout version should be shown as the “new LV”. > Thanks to [~ehf] who found and reported this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4916) DataTransfer may mask the IOException during block transfering
[ https://issues.apache.org/jira/browse/HDFS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081782#comment-14081782 ] Hadoop QA commented on HDFS-4916: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12588510/4916.v0.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7519//console This message is automatically generated. > DataTransfer may mask the IOException during block transfering > -- > > Key: HDFS-4916 > URL: https://issues.apache.org/jira/browse/HDFS-4916 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.0.4-alpha, 2.0.5-alpha >Reporter: Zesheng Wu >Priority: Critical > Attachments: 4916.v0.patch > > > When a new datanode is added to the pipeline, the client will trigger the > block transfer process. In the current implementation, the src datanode calls > the run() method of the DataTransfer to transfer the block, this method will > mask the IOExceptions during the transfering, and will make the client not > realize the failure during the transferring, as a result the client will > mistake the failing transferring as successful one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6797) Misleading LayoutVersion information during data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081778#comment-14081778 ] Hadoop QA commented on HDFS-6797: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658983/HDFS-6797.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7516//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7516//console This message is automatically generated. > Misleading LayoutVersion information during data node upgrade > - > > Key: HDFS-6797 > URL: https://issues.apache.org/jira/browse/HDFS-6797 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6797.patch > > > Before upgrade, data node version was -55. The new data node version remained > at -55. During upgrade we got he following messages: > {code} > 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: > Data-node version: -55 and name-node layout version: -56 > ... > ... > ... > ... > 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrading block pool storage directory > /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239. >old LV = -55; old CTime = 1402508907789. >new LV = -56; new CTime = 1405453914270 > 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at > /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete > 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Setting up storage: > nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326 > 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae > {code} > after upgrade completing, restart of DN still shows message regarding version > difference: > {code} > INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and > name-node layout version: -56 > {code} > This causes confusion to the operators as if upgrade did not succeed since > data node's layout version is not updated to the "new LV" value > Actually name node's layout version is displayed as the "new LV" value. > Since the data node and name node layout versions are separate now, the new > data node layout version should be shown as the “new LV”. > Thanks to [~ehf] who found and reported this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6685) Balancer should preserve storage type of replicas
[ https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081762#comment-14081762 ] Arpit Agarwal commented on HDFS-6685: - +1 for the latest patch. > Balancer should preserve storage type of replicas > - > > Key: HDFS-6685 > URL: https://issues.apache.org/jira/browse/HDFS-6685 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h6685_20140728.patch, h6685_20140729.patch, > h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch > > > When Balancer moves replicas to balance the cluster, it should always move > replicas from a storage with any type to another storage with the same type, > i.e. it preserves storage type of replicas. It does not make sense to move > replicas to a different storage type. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6797) Misleading LayoutVersion information during data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081755#comment-14081755 ] Arpit Agarwal commented on HDFS-6797: - Pending Jenkins. > Misleading LayoutVersion information during data node upgrade > - > > Key: HDFS-6797 > URL: https://issues.apache.org/jira/browse/HDFS-6797 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6797.patch > > > Before upgrade, data node version was -55. The new data node version remained > at -55. During upgrade we got he following messages: > {code} > 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: > Data-node version: -55 and name-node layout version: -56 > ... > ... > ... > ... > 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrading block pool storage directory > /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239. >old LV = -55; old CTime = 1402508907789. >new LV = -56; new CTime = 1405453914270 > 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: > Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at > /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete > 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Setting up storage: > nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326 > 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae > {code} > after upgrade completing, restart of DN still shows message regarding version > difference: > {code} > INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and > name-node layout version: -56 > {code} > This causes confusion to the operators as if upgrade did not succeed since > data node's layout version is not updated to the "new LV" value > Actually name node's layout version is displayed as the "new LV" value. > Since the data node and name node layout versions are separate now, the new > data node layout version should be shown as the “new LV”. > Thanks to [~ehf] who found and reported this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6798) Add test case for incorrect data node condition during balancing
[ https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081753#comment-14081753 ] Arpit Agarwal commented on HDFS-6798: - +1 for the patch, pending Jenkins. Thanks for adding this test case Benoy! > Add test case for incorrect data node condition during balancing > > > Key: HDFS-6798 > URL: https://issues.apache.org/jira/browse/HDFS-6798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6798.patch > > > The Balancer makes a check to see if a block's location is a known data node. > But the variable it uses to check is wrong. This issue was fixed in HDFS-6364. > There was no way to easily unit test it at that time. Since HDFS-6441 enables > one to simulate this case, it was decided to add the unit test once HDFS-6441 > is resolved. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock
[ https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081748#comment-14081748 ] Arpit Agarwal commented on HDFS-6788: - Thanks Andrew/Yongjun for verifying the lock order correctness. Just reviewed and +1 from me also, pending Jenkins. > Improve synchronization in BPOfferService with read write lock > -- > > Key: HDFS-6788 > URL: https://issues.apache.org/jira/browse/HDFS-6788 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch > > > Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block > at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), > though they are just reading the same blockpool id. This is unnecessary > overhead and may cause performance hit when many threads compete. Filing this > jira to replace synchronized method with read write lock > (ReentrantReadWriteLock). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6796) Improving the argument check during balancer command line parsing
[ https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6796: --- Attachment: HDFS-6796.patch Makes sense. Thanks for the review and suggestion, [~szetszwo]. I have updated the patch with individual checks for each option. > Improving the argument check during balancer command line parsing > - > > Key: HDFS-6796 > URL: https://issues.apache.org/jira/browse/HDFS-6796 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6796.patch, HDFS-6796.patch > > > Currently balancer CLI parser simply checks if the total number of arguments > is greater than 2 inside the loop. Since the check does not include any loop > variables, it is not a proper check when there more than 2 arguments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6800) Detemine how Datanode layout changes should interact with rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081743#comment-14081743 ] Colin Patrick McCabe commented on HDFS-6800: Just to give a bit of additional context here: [~james.thomas] did some benchmarks that showed that "creating 200k hard links (100k blocks) take just over a second on a dual core PC" when using the optimized hard link code. On a performance basis, I think it's very feasible to support rolling upgrade to new DN layout versions. If we don't choose to support this, we are going to make it very hard to evolve the DN code in the future. We would then require a major version change (i.e. Hadoop 3.0) to make any major DN changes. So I think we should just change the documentation a bit and support this in the obvious way... by having the users call {{datanode \-rollback}} during a rolling rollback if needed. What do you guys think? > Detemine how Datanode layout changes should interact with rolling upgrade > - > > Key: HDFS-6800 > URL: https://issues.apache.org/jira/browse/HDFS-6800 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe > > We need to handle attempts to rolling-upgrade the DataNode to a new storage > directory layout. > One approach is to disallow such upgrades. If we choose this approach, we > should make sure that the system administrator gets a helpful error message > and a clean failure when trying to use rolling upgrade to a version that > doesn't support it. Based on the compatibility guarantees described in > HDFS-5535, this would mean that *any* future DataNode layout changes would > require a major version upgrade. > Another approach would be to support rolling upgrade from an old DN storage > layout to a new layout. This approach requires us to change our > documentation to explain to users that they should supply the {{\-rollback}} > command on the command-line when re-starting the DataNodes during rolling > rollback. Currently the documentation just says to restart the DataNode > normally. > Another issue here is that the DataNode's usage message describes rollback > options that no longer exist. The help text says that the DN supports > {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HDFS-6780) Batch the encryption zones listing API
[ https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-6780 started by Andrew Wang. > Batch the encryption zones listing API > -- > > Key: HDFS-6780 > URL: https://issues.apache.org/jira/browse/HDFS-6780 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-6780.001.patch > > > To future-proof the API, it'd be better if the listEZs API returned a > RemoteIterator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6780) Batch the encryption zones listing API
[ https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6780: -- Affects Version/s: fs-encryption (HADOOP-10150 and HDFS-6134) > Batch the encryption zones listing API > -- > > Key: HDFS-6780 > URL: https://issues.apache.org/jira/browse/HDFS-6780 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-6780.001.patch > > > To future-proof the API, it'd be better if the listEZs API returned a > RemoteIterator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081739#comment-14081739 ] Colin Patrick McCabe commented on HDFS-6482: Hey guys, I filed HDFS-6800 to have the rolling upgrade discussion. I'm going to commit this to trunk (but *not* to any other branches) in a bit if nobody has any objections. > Use block ID-based block layout on datanodes > > > Key: HDFS-6482 > URL: https://issues.apache.org/jira/browse/HDFS-6482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0 >Reporter: James Thomas >Assignee: James Thomas > Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, > HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, > HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, > hadoop-24-datanode-dir.tgz > > > Right now blocks are placed into directories that are split into many > subdirectories when capacity is reached. Instead we can use a block's ID to > determine the path it should go in. This eliminates the need for the LDir > data structure that facilitates the splitting of directories when they reach > capacity as well as fields in ReplicaInfo that keep track of a replica's > location. > An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock
[ https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081736#comment-14081736 ] Andrew Wang commented on HDFS-6788: --- I don't see any cases where a thread with the read lock would try to get the write lock. The four methods that take the read lock are getNamespaceInfo, getActiveNN, getBlockPoolId, and toString, and they all look safe. +1 pending Jenkins from me. Yongjun, for the indirection, it's probably not a big deal since the JIT is pretty likely to inline it. Unless we see this crop up as an issue, I'm inclined not to bother since we have much bigger perf issues (for instance, that there is a global RW lock). > Improve synchronization in BPOfferService with read write lock > -- > > Key: HDFS-6788 > URL: https://issues.apache.org/jira/browse/HDFS-6788 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch > > > Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block > at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), > though they are just reading the same blockpool id. This is unnecessary > overhead and may cause performance hit when many threads compete. Filing this > jira to replace synchronized method with read write lock > (ReentrantReadWriteLock). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible
[ https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081734#comment-14081734 ] Hadoop QA commented on HDFS-6794: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658982/HDFS-6794.03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7515//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7515//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7515//console This message is automatically generated. > Update BlockManager methods to use DatanodeStorageInfo where possible > - > > Key: HDFS-6794 > URL: https://issues.apache.org/jira/browse/HDFS-6794 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, > HDFS-6794.03.patch > > > Post HDFS-2832, BlockManager methods can be updated to accept > DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6800) Detemine how Datanode layout changes should interact with rolling upgrade
Colin Patrick McCabe created HDFS-6800: -- Summary: Detemine how Datanode layout changes should interact with rolling upgrade Key: HDFS-6800 URL: https://issues.apache.org/jira/browse/HDFS-6800 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe We need to handle attempts to rolling-upgrade the DataNode to a new storage directory layout. One approach is to disallow such upgrades. If we choose this approach, we should make sure that the system administrator gets a helpful error message and a clean failure when trying to use rolling upgrade to a version that doesn't support it. Based on the compatibility guarantees described in HDFS-5535, this would mean that *any* future DataNode layout changes would require a major version upgrade. Another approach would be to support rolling upgrade from an old DN storage layout to a new layout. This approach requires us to change our documentation to explain to users that they should supply the {{\-rollback}} command on the command-line when re-starting the DataNodes during rolling rollback. Currently the documentation just says to restart the DataNode normally. Another issue here is that the DataNode's usage message describes rollback options that no longer exist. The help text says that the DN supports {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megasthenis Asteris updated HDFS-6799: -- Attachment: (was: HDFS-6799.patch) > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > Attachments: HDFS-6799.patch > > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megasthenis Asteris updated HDFS-6799: -- Attachment: HDFS-6799.patch > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > Attachments: HDFS-6799.patch > > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock
[ https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081708#comment-14081708 ] Yongjun Zhang commented on HDFS-6788: - I thought I made a mistake in earlier patch after seeing Arpit's comment, but after careful reviewing, it looks fine to me. Hi [~andrew.wang], I uploaded a patch to address the second comments of yours. For the extra newline in the imports section, it's added by eclipse to separate the imports comp.google and org.apache into different sections , and I think it's not bad and didn't change it. Would you please take a look at the new revision? Thanks. Hi [~arpitagarwal], thanks for reviewing it earlier, if you have time, I will appreciate that you look at it again. BTW Andrew, I saw that both FSDirection and FSNameSystem's lock code have quite some indirection (thus a bit more runtime) when making a call, say, {code} void readLock() { this.dirLock.readLock().lock(); } {code} It go through several "." and each of them is an indirection. This can be improved. For example, we can create a class member, e.g., {{mReadLock = this.dirLock.readLock();}}, then modify the readLock() methods to {code} void readLock() { mReadLock.lock(); } {code} I can create a jira to change both classes if you agree. Thanks. > Improve synchronization in BPOfferService with read write lock > -- > > Key: HDFS-6788 > URL: https://issues.apache.org/jira/browse/HDFS-6788 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch > > > Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block > at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), > though they are just reading the same blockpool id. This is unnecessary > overhead and may cause performance hit when many threads compete. Filing this > jira to replace synchronized method with read write lock > (ReentrantReadWriteLock). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081706#comment-14081706 ] Benoy Antony commented on HDFS-6799: Can you please make [~megas] a contributor so that he can assign this to himself ? > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > Attachments: HDFS-6799.patch > > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081704#comment-14081704 ] Benoy Antony commented on HDFS-6799: Good catch , [~megas]. Is there a way to add a unit test for this ? [~arpitagarwal], [~szetszwo], could you please take a look ? > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > Attachments: HDFS-6799.patch > > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6780) Batch the encryption zones listing API
[ https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6780: -- Attachment: hdfs-6780.001.patch Patch attached. Refactored listEncryptionZones to instead return a RemoteIterator, which wraps a BatchedRemoteIterator. > Batch the encryption zones listing API > -- > > Key: HDFS-6780 > URL: https://issues.apache.org/jira/browse/HDFS-6780 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-6780.001.patch > > > To future-proof the API, it'd be better if the listEZs API returned a > RemoteIterator. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081694#comment-14081694 ] Colin Patrick McCabe commented on HDFS-573: --- [~stevebovy]: you'll be happy to hear that we dynamically load {{libjvm.so}} in the HADOOP-10388 branch. The main reason for doing it there is because that branch add a pure native client which doesn't require {{libjvm.so}}, in addition to the existing JNI client. > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6796) Improving the argument check during balancer command line parsing
[ https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081695#comment-14081695 ] Tsz Wo Nicholas Sze commented on HDFS-6796: --- Let's also add some meaningful error messages. The first checkArgument(..) can be replaced by checkArgument(..) in each individual case as below. {code} +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java (working copy) @@ -1691,9 +1691,9 @@ if (args != null) { try { for(int i = 0; i < args.length; i++) { -checkArgument(args.length >= 2, "args = " + Arrays.toString(args)); if ("-threshold".equalsIgnoreCase(args[i])) { - i++; + checkArgument(++i == args.length, + "Threshold value is missing: args = " + Arrays.toString(args)); try { threshold = Double.parseDouble(args[i]); if (threshold < 1 || threshold > 100) { @@ -1708,7 +1708,8 @@ throw e; } } else if ("-policy".equalsIgnoreCase(args[i])) { - i++; + checkArgument(++i == args.length, + "Policy value is missing: args = " + Arrays.toString(args)); try { policy = BalancingPolicy.parse(args[i]); } catch(IllegalArgumentException e) { ... {code} > Improving the argument check during balancer command line parsing > - > > Key: HDFS-6796 > URL: https://issues.apache.org/jira/browse/HDFS-6796 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6796.patch > > > Currently balancer CLI parser simply checks if the total number of arguments > is greater than 2 inside the loop. Since the check does not include any loop > variables, it is not a proper check when there more than 2 arguments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081689#comment-14081689 ] Colin Patrick McCabe commented on HDFS-573: --- bq. I have just one question though. My initial inclination was to put TYPE_CHECKED_PRINTF_FORMAT in platform.h as well. However, I then backed that out and put the ifdef in exception.h, because it has never been clear to me if exception.h is part of the public API The only header file that's part of the public API is {{hdfs.h}}. That's the only one we export to end-users... nobody can even get access to the other ones without a Hadoop source tree. You should feel free to change, add, or remove things from any header file without worrying about compatibility, as long as that header is not {{hdfs.h}}. bq. BTW Colin, thanks for the code review. The work so far has been aimed at a straight port, warts and all, but I'm happy to roll in a few more small fixes for existing problems while I'm in here. I'll work on a v2 of the patch. Thanks, Chris. I think what you've got looks pretty good... I wish all libhdfs patches could be this good :) > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6788) Improve synchronization in BPOfferService with read write lock
[ https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6788: Attachment: HDFS-6788.002.patch > Improve synchronization in BPOfferService with read write lock > -- > > Key: HDFS-6788 > URL: https://issues.apache.org/jira/browse/HDFS-6788 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch > > > Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block > at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), > though they are just reading the same blockpool id. This is unnecessary > overhead and may cause performance hit when many threads compete. Filing this > jira to replace synchronized method with read write lock > (ReentrantReadWriteLock). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megasthenis Asteris updated HDFS-6799: -- Status: Patch Available (was: Open) > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > Attachments: HDFS-6799.patch > > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081674#comment-14081674 ] Stephen Bovy commented on HDFS-573: --- Thanks I am an old-fashioned IBM main-framer, All my changes are backwards compatible :) Here is a slice for setting up dynamic load of the JVM // begin JVM function set-up // new jvm function declarations typedef jint (*FGetVMS) ( JavaVM**, const jsize, jint* ); typedef jint (*FCreateVM) ( JavaVM**, void**, JavaVMInitArgs* ); #ifdef LOADJVM // dynamically loaded static FGetVMShdfs_fpGetVM= NULL; static FCreateVM hdfs_fpCreateVM = NULL; #else // implicitly linked and auto-loaded (original default code) static FGetVMShdfs_fpGetVM= JNI_GetCreatedJavaVMs; static FCreateVM hdfs_fpCreateVM = JNI_CreateJavaVM; #endif > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megasthenis Asteris updated HDFS-6799: -- Attachment: HDFS-6799.patch > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > Attachments: HDFS-6799.patch > > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megasthenis Asteris updated HDFS-6799: -- Status: Open (was: Patch Available) > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megasthenis Asteris updated HDFS-6799: -- Status: Patch Available (was: Open) > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
[ https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megasthenis Asteris updated HDFS-6799: -- Priority: Minor (was: Major) > The invalidate method in SimulatedFSDataset.java failed to remove > (invalidate) blocks from the file system. > --- > > Key: HDFS-6799 > URL: https://issues.apache.org/jira/browse/HDFS-6799 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 2.4.1 >Reporter: Megasthenis Asteris >Priority: Minor > > The invalidate(String bpid, Block[] invalidBlks) method in > SimulatedFSDataset.java should remove all invalidBlks from the simulated file > system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.
Megasthenis Asteris created HDFS-6799: - Summary: The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system. Key: HDFS-6799 URL: https://issues.apache.org/jira/browse/HDFS-6799 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 2.4.1 Reporter: Megasthenis Asteris The invalidate(String bpid, Block[] invalidBlks) method in SimulatedFSDataset.java should remove all invalidBlks from the simulated file system. It currently fails to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081665#comment-14081665 ] Chris Nauroth commented on HDFS-573: I guess I misread something on HADOOP-10388. I thought I saw a nice clean init function in the jni_helper.c over there. I may have incorrectly assumed that this cascaded all the way out to the client-facing API. :-) Thanks for sharing your experiences, Stephen. Unfortunately, I think we'd have a hard time incorporating those changes right now, given the compatibility concerns. I suppose backwards-incompatible changes like this could be considered on the 3.x release boundary. BTW Colin, thanks for the code review. The work so far has been aimed at a straight port, warts and all, but I'm happy to roll in a few more small fixes for existing problems while I'm in here. I'll work on a v2 of the patch. I have just one question though. My initial inclination was to put {{TYPE_CHECKED_PRINTF_FORMAT}} in platform.h as well. However, I then backed that out and put the ifdef in exception.h, because it has never been clear to me if exception.h is part of the public API. Most of the functions can't reasonably be considered public, because of the dependence on passing a {{JNIEnv}}. However, then there is {{getExceptionInfo}}. As long as we agree that only hdfs.h is the public API, and not exception.h, then I'll move {{TYPE_CHECKED_PRINTF_FORMAT}} back to platform.h. If client applications ever {{#include }}, then they'd also have the complexity of selecting the correct platform.h, which would be undesirable. > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081663#comment-14081663 ] Stephen Bovy commented on HDFS-573: --- SAMPLE "Optional" INIT-LIB function :) // FLAG :: init-lib invoked (speed up jvm-init and avoid locks) extern short hdfs_JniInitLib; extern char hdfs_HadoopHome [2000]; extern char hdfs_JavaHome [2000]; // the following are used for no-threads support // use this flag to bypass thread logic // enable non-threaded speed-ups extern short hdfs_Threads; // Init the HDFS library int hdfsJNILibInit ( pHdfsInitParms parms ) { JNIEnv* env; // disable thread support for now. hdfs_Threads = 0; if ( parms ) { if ( parms->JavaHome ) { if ( strlen(parms->JavaHome) > 2000 ) { fprintf ( stderr, "The JAVA_HOME variable is too long.\n" ); return 1; } strcpy ( hdfs_JavaHome, parms->JavaHome ); } if ( parms->HadoopHome ) { if ( strlen(parms->HadoopHome) > 2000 ) { fprintf ( stderr, "The HADOOP_HOME variable is too long.\n" ); return 1; } strcpy ( hdfs_HadoopHome, parms->HadoopHome ); } if (parms->threads) hdfs_Threads = parms->threads; } env = getJNIEnv(); if (!env) return 1; hdfs_JniInitLib = 1; return 0; } > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6425: -- Summary: Large postponedMisreplicatedBlocks has impact on blockReport latency (was: reset postponedMisreplicatedBlocks and postponedMisreplicatedBlocksCount when NN becomes active) > Large postponedMisreplicatedBlocks has impact on blockReport latency > > > Key: HDFS-6425 > URL: https://issues.apache.org/jira/browse/HDFS-6425 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-6425.patch > > > Sometimes we have large number of over replicates when NN fails over. When > the new active NN took over, over replicated blocks will be put to > postponedMisreplicatedBlocks until all DNs for that block aren't stale > anymore. > We have a case where NNs flip flop. Before postponedMisreplicatedBlocks > became empty, NN fail over again and again. So postponedMisreplicatedBlocks > just kept increasing until the cluster is stable. > In addition, large postponedMisreplicatedBlocks could make > rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks > takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6425: -- Status: Open (was: Patch Available) > Large postponedMisreplicatedBlocks has impact on blockReport latency > > > Key: HDFS-6425 > URL: https://issues.apache.org/jira/browse/HDFS-6425 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-6425.patch > > > Sometimes we have large number of over replicates when NN fails over. When > the new active NN took over, over replicated blocks will be put to > postponedMisreplicatedBlocks until all DNs for that block aren't stale > anymore. > We have a case where NNs flip flop. Before postponedMisreplicatedBlocks > became empty, NN fail over again and again. So postponedMisreplicatedBlocks > just kept increasing until the cluster is stable. > In addition, large postponedMisreplicatedBlocks could make > rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks > takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081640#comment-14081640 ] Stephen Bovy commented on HDFS-573: --- Thanks Chris, We have had some offline discussions before. Thanks for the explanation. I have indeed added many enhancements. I would need to get management permission to share these (sigh) :) I have added optional support for dynamically loading the JVM. This simplifies build issues, and solves a lot of configuration usage issues. I have indeed added an optional lib-init function and have also added support for using a global static for the JVM pointer. I have added support for a thread-flag, which can be statically set by the compiler or dynamically set in the lib-init. When the thread flag is not set I use a static global to save the thread-env pointer which gets created when the jvm is created, and I only need to utilize and access that one-pointer in one-place. When the thread flag is not set, all the special thread code is bypassed with IF statements I have tested this in thread-mode with the thread-tester, and of course I am using it with my app in non thread mode . Works great either way. > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081637#comment-14081637 ] Colin Patrick McCabe commented on HDFS-573: --- bq. I am probably exposing my ignorance, so please forgive me. Are you saying that using JNI automatically implies and requires thread support Yep, that's what I'm saying. bq. and that every JNI call is running on a thread? Not every JNI call runs in a different thread, but many HDFS JNI calls certainly do. For example, {{hdfsWrite}} uses {{DFSOutputStream}}, which ends up starting a thread to write to the pipeline. bq. From my very quick scan of the HADOOP-10388 branch, it looks like we'll be providing a clearer initialization sequence there. libhdfs likely will need to remain this way though. I agree 100% that libhdfs should have had an "init" function that created some kind of context we could pass around. But... we're going to try to keep the existing API in HADOOP-10388. :P Sorry, it's just really nice to keep compatibility where you can. > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081621#comment-14081621 ] Chris Nauroth commented on HDFS-573: I think there are 2 aspects to the question: # libhdfs embeds a JVM. The JVM itself always runs multiple internal threads, even if your libhdfs application code doesn't run multiple threads. This means that by extension, a libhdfs application is always multi-threaded, even if the application's code is entirely single-threaded/synchronous. This rules out things like linking to a single-threaded C runtime library for a supposed performance boost with single-core execution. A libhdfs application must always link to a C runtime library with multi-threading support. # As far as the data structures inside the libhdfs code itself, you're correct that there is no thread safety concern if the application runs entirely single-threaded and makes synchronous calls. Technically, we don't need a lock around the hash table in that case. However, it might just cause end user confusion if we publish thread-safe vs. non-thread-safe builds or some kind of configuration flag to skip the locking. The effects of running multiple threads without the locking would be catastrophic, probably a crash of some sort. I haven't personally seen contention on this lock cause a real-world performance bottleneck, so I wonder if such an optimization is necessary. For the scope of this patch, I'd prefer to focus on a straight-up port of the existing code to work on Windows. We're taking a big step here, moving from not even compiling on Windows to fully functional, and the patch is already pretty large. :-) Potential performance enhancements certainly are welcome in separate patches. FWIW, I think libhdfs has a weakness in that it has no clear-cut "initialize" function for the application to call during a single-threaded bootstrap sequence. This would have given us an easy place to start the {{JavaVM}} and pre-populate the mapping of class names to class references. Unfortunately, it would be backwards-incompatible to add that function now and demand existing applications change their code to call our initialize function. Instead, we have no choice but to do lazy initialization, and that drives a lot of the complexity in libhdfs with the mutexes and the thread-local storage. From my very quick scan of the HADOOP-10388 branch, it looks like we'll be providing a clearer initialization sequence there. libhdfs likely will need to remain this way though. > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081618#comment-14081618 ] Colin Patrick McCabe commented on HDFS-6482: Why don't we merge this to trunk and then open another JIRA to iron out any issues with rolling upgrades between different DN layout versions. At minimum, we should decide whether we support rolling DN upgrades between different layout versions, and if we don't support it, give a clear failure message to admins. But this patch is big enough that I don't think cramming all that into here is a good idea. There also seem to be some issues with rolling DN downgrade now (for example, HDFS-6005 removed {{datanode \-rollingupgrade \-rollback}}, but not the usage text for it displayed in {{\-help}}.) > Use block ID-based block layout on datanodes > > > Key: HDFS-6482 > URL: https://issues.apache.org/jira/browse/HDFS-6482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0 >Reporter: James Thomas >Assignee: James Thomas > Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, > HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, > HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, > hadoop-24-datanode-dir.tgz > > > Right now blocks are placed into directories that are split into many > subdirectories when capacity is reached. Instead we can use a block's ID to > determine the path it should go in. This eliminates the need for the LDir > data structure that facilitates the splitting of directories when they reach > capacity as well as fields in ReplicaInfo that keep track of a replica's > location. > An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081607#comment-14081607 ] James Thomas commented on HDFS-6482: Or [~kihwal]? > Use block ID-based block layout on datanodes > > > Key: HDFS-6482 > URL: https://issues.apache.org/jira/browse/HDFS-6482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0 >Reporter: James Thomas >Assignee: James Thomas > Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, > HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, > HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, > hadoop-24-datanode-dir.tgz > > > Right now blocks are placed into directories that are split into many > subdirectories when capacity is reached. Instead we can use a block's ID to > determine the path it should go in. This eliminates the need for the LDir > data structure that facilitates the splitting of directories when they reach > capacity as well as fields in ReplicaInfo that keep track of a replica's > location. > An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
[ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081603#comment-14081603 ] James Thomas commented on HDFS-6482: [~sureshms] Is the documentation the Rollback section at http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html correct? You are supposed to restart the DNs normally, without flags like "-rollback" or "-rollingupgrade rollback"? If you restart the DNs with "-rollback", everything should work normally and the previous directory should be restored with the old layout. [~arpitagarwal], any thoughts on this? > Use block ID-based block layout on datanodes > > > Key: HDFS-6482 > URL: https://issues.apache.org/jira/browse/HDFS-6482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0 >Reporter: James Thomas >Assignee: James Thomas > Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, > HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, > HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, > hadoop-24-datanode-dir.tgz > > > Right now blocks are placed into directories that are split into many > subdirectories when capacity is reached. Instead we can use a block's ID to > determine the path it should go in. This eliminates the need for the LDir > data structure that facilitates the splitting of directories when they reach > capacity as well as fields in ReplicaInfo that keep track of a replica's > location. > An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6791: -- Assignee: Ming Ma Status: Patch Available (was: Open) > A block could remain under replicated if all of its replicas are on > decommissioned nodes > > > Key: HDFS-6791 > URL: https://issues.apache.org/jira/browse/HDFS-6791 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-6791.patch > > > Here is the scenario. > 1. Normally before NN transitions a DN to decommissioned state, enough > replicas have been copied to other "in service" DNs. However, in some rare > situations, the cluster got into a state where a DN is in decommissioned > state and a block's only replica is on that DN. In such state, the number of > replication reported by fsck is 1; the block just stays in under replicated > state; applications can still read the data, given decommissioned node can > served read traffic. > This can happen in some error situations such DN failure or NN failover. For > example > a) a block's only replica is node A temporarily. > b) Start decommission process on node A. > c) When node A is in "decommission-in-progress" state, node A crashed. NN > will mark node A as dead. > d) After node A rejoins the cluster, NN will mark node A as decommissioned. > 2. In theory, NN should take care of under replicated blocks. But it doesn't > for this special case where the only replica is on decommissioned node. That > is because NN has the policy of "decommissioned node can't be picked the > source node for replication". > {noformat} > BlockManager.java > chooseSourceDatanode > // never use already decommissioned nodes > if(node.isDecommissioned()) > continue; > {noformat} > 3. Given NN marks the node as decommissioned, admins will shutdown the > datanode. Under replicated blocks turn into missing blocks. > 4. The workaround is to recommission the node so that NN can start the > replication from the node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6791: -- Attachment: HDFS-6791.patch The patch keeps the node in DECOMMISSION_INPROGRESS state if the node becomes dead during decommission. In that way, the decommission can resume when the node rejoins the cluster later. > A block could remain under replicated if all of its replicas are on > decommissioned nodes > > > Key: HDFS-6791 > URL: https://issues.apache.org/jira/browse/HDFS-6791 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma > Attachments: HDFS-6791.patch > > > Here is the scenario. > 1. Normally before NN transitions a DN to decommissioned state, enough > replicas have been copied to other "in service" DNs. However, in some rare > situations, the cluster got into a state where a DN is in decommissioned > state and a block's only replica is on that DN. In such state, the number of > replication reported by fsck is 1; the block just stays in under replicated > state; applications can still read the data, given decommissioned node can > served read traffic. > This can happen in some error situations such DN failure or NN failover. For > example > a) a block's only replica is node A temporarily. > b) Start decommission process on node A. > c) When node A is in "decommission-in-progress" state, node A crashed. NN > will mark node A as dead. > d) After node A rejoins the cluster, NN will mark node A as decommissioned. > 2. In theory, NN should take care of under replicated blocks. But it doesn't > for this special case where the only replica is on decommissioned node. That > is because NN has the policy of "decommissioned node can't be picked the > source node for replication". > {noformat} > BlockManager.java > chooseSourceDatanode > // never use already decommissioned nodes > if(node.isDecommissioned()) > continue; > {noformat} > 3. Given NN marks the node as decommissioned, admins will shutdown the > datanode. Under replicated blocks turn into missing blocks. > 4. The workaround is to recommission the node so that NN can start the > replication from the node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081569#comment-14081569 ] Hadoop QA commented on HDFS-573: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658964/HDFS-573.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.TestLeaseRecovery2 {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7514//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7514//console This message is automatically generated. > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6796) Improving the argument check during balancer command line parsing
[ https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081571#comment-14081571 ] Hadoop QA commented on HDFS-6796: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658967/HDFS-6796.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7513//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7513//console This message is automatically generated. > Improving the argument check during balancer command line parsing > - > > Key: HDFS-6796 > URL: https://issues.apache.org/jira/browse/HDFS-6796 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6796.patch > > > Currently balancer CLI parser simply checks if the total number of arguments > is greater than 2 inside the loop. Since the check does not include any loop > variables, it is not a proper check when there more than 2 arguments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-573) Porting libhdfs to Windows
[ https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081555#comment-14081555 ] Stephen Bovy commented on HDFS-573: --- Thanks, I am probably exposing my ignorance, so please forgive me. Are you saying that using JNI automatically implies and requires thread support, and that every JNI call is running on a thread? My hdfs client does not use threads, so each hdfs call is synchronous, and each jni call is also synchronous, and within the context the code accessing the hash table should also be synchronous. Please correct me gently if I am wrong :) > Porting libhdfs to Windows > -- > > Key: HDFS-573 > URL: https://issues.apache.org/jira/browse/HDFS-573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs > Environment: Windows, Visual Studio 2008 >Reporter: Ziliang Guo >Assignee: Chris Nauroth > Attachments: HDFS-573.1.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The current C code in libhdfs is written using C99 conventions and also uses > a few POSIX specific functions such as hcreate, hsearch, and pthread mutex > locks. To compile it using Visual Studio would require a conversion of the > code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of > the POSIX functions. The code also uses the stdint.h header, which is not > part of the original C89, but there exists what appears to be a BSD licensed > reimplementation written to be compatible with MSVC floating around. I have > already done the other necessary conversions, as well as created a simplistic > hash bucket for use with hcreate and hsearch and successfully built a DLL of > libhdfs. Further testing is needed to see if it is usable by other programs > to actually access hdfs, which will likely happen in the next few weeks as > the Condor Project continues with its file transfer work. > In the process, I've removed a few what I believe are extraneous consts and > also fixed an incorrect array initialization where someone was attempting to > initialize with something like this: JavaVMOption options[noArgs]; where > noArgs was being incremented in the code above. This was in the > hdfsJniHelper.c file, in the getJNIEnv function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6798) Add test case for incorrect data node condition during balancing
[ https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6798: --- Status: Patch Available (was: Open) > Add test case for incorrect data node condition during balancing > > > Key: HDFS-6798 > URL: https://issues.apache.org/jira/browse/HDFS-6798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6798.patch > > > The Balancer makes a check to see if a block's location is a known data node. > But the variable it uses to check is wrong. This issue was fixed in HDFS-6364. > There was no way to easily unit test it at that time. Since HDFS-6441 enables > one to simulate this case, it was decided to add the unit test once HDFS-6441 > is resolved. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6798) Add test case for incorrect data node condition during balancing
[ https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6798: --- Attachment: HDFS-6798.patch The attached patch adds the unit test for HDFS-6364. > Add test case for incorrect data node condition during balancing > > > Key: HDFS-6798 > URL: https://issues.apache.org/jira/browse/HDFS-6798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 2.4.1 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-6798.patch > > > The Balancer makes a check to see if a block's location is a known data node. > But the variable it uses to check is wrong. This issue was fixed in HDFS-6364. > There was no way to easily unit test it at that time. Since HDFS-6441 enables > one to simulate this case, it was decided to add the unit test once HDFS-6441 > is resolved. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6798) Add test case for incorrect data node condition during balancing
Benoy Antony created HDFS-6798: -- Summary: Add test case for incorrect data node condition during balancing Key: HDFS-6798 URL: https://issues.apache.org/jira/browse/HDFS-6798 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.4.1 Reporter: Benoy Antony Assignee: Benoy Antony The Balancer makes a check to see if a block's location is a known data node. But the variable it uses to check is wrong. This issue was fixed in HDFS-6364. There was no way to easily unit test it at that time. Since HDFS-6441 enables one to simulate this case, it was decided to add the unit test once HDFS-6441 is resolved. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6786) Fix potential issue of cache refresh interval
[ https://issues.apache.org/jira/browse/HDFS-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-6786. Resolution: Not a Problem > Fix potential issue of cache refresh interval > - > > Key: HDFS-6786 > URL: https://issues.apache.org/jira/browse/HDFS-6786 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.4.0 >Reporter: Yi Liu >Assignee: Yi Liu > > In {{CacheReplicationMonitor}}, following code is try to check whether needs > to rescan every interval ms, if rescan takes n ms, then subtract n ms for the > interval. But if the delta <=0, it breaks and start rescan, there will be > potential issue: if user set the interval to a small value or rescan finished > after a while exceeding the interval, then rescan will happen in loop. > Furthermore, {{delta <= 0}} trigger the rescan should not be the intention, > since if needs rescan, {{needsRescan}} will be set. > {code} > while (true) { > if (shutdown) { > LOG.info("Shutting down CacheReplicationMonitor"); > return; > } > if (needsRescan) { > LOG.info("Rescanning because of pending operations"); > break; > } > long delta = (startTimeMs + intervalMs) - curTimeMs; > if (delta <= 0) { > LOG.info("Rescanning after " + (curTimeMs - startTimeMs) + > " milliseconds"); > break; > } > doRescan.await(delta, TimeUnit.MILLISECONDS); > curTimeMs = Time.monotonicNow(); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6786) Fix potential issue of cache refresh interval
[ https://issues.apache.org/jira/browse/HDFS-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081513#comment-14081513 ] Colin Patrick McCabe commented on HDFS-6786: I agree with Andrew that a minimum interval seems unnecessary here. Sysadmins rarely adjust this value, and if they do, we assume that they know what they're doing... similar to a lot of the other tunables. The only case I can recall where we set a minimum is in block size. But we did that because block size can be controlled by the client creating a file (you don't need to be the sysadmin adjusting a configuration to set the block size). I'm going to close this one out since it's working as intended. > Fix potential issue of cache refresh interval > - > > Key: HDFS-6786 > URL: https://issues.apache.org/jira/browse/HDFS-6786 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.4.0 >Reporter: Yi Liu >Assignee: Yi Liu > > In {{CacheReplicationMonitor}}, following code is try to check whether needs > to rescan every interval ms, if rescan takes n ms, then subtract n ms for the > interval. But if the delta <=0, it breaks and start rescan, there will be > potential issue: if user set the interval to a small value or rescan finished > after a while exceeding the interval, then rescan will happen in loop. > Furthermore, {{delta <= 0}} trigger the rescan should not be the intention, > since if needs rescan, {{needsRescan}} will be set. > {code} > while (true) { > if (shutdown) { > LOG.info("Shutting down CacheReplicationMonitor"); > return; > } > if (needsRescan) { > LOG.info("Rescanning because of pending operations"); > break; > } > long delta = (startTimeMs + intervalMs) - curTimeMs; > if (delta <= 0) { > LOG.info("Rescanning after " + (curTimeMs - startTimeMs) + > " milliseconds"); > break; > } > doRescan.await(delta, TimeUnit.MILLISECONDS); > curTimeMs = Time.monotonicNow(); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6784) Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls setNeedsRescan multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081506#comment-14081506 ] Colin Patrick McCabe commented on HDFS-6784: I thought about this a little more, and I posted a patch on HDFS-6783 that I think solves this problem. Check it out. Sorry for all the confusion... sometimes it's tough to reason about these locking issues. > Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls > setNeedsRescan multiple times. > --- > > Key: HDFS-6784 > URL: https://issues.apache.org/jira/browse/HDFS-6784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Affects Versions: 3.0.0 >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6784.001.patch > > > In HDFS CacheReplicationMonitor, rescan is expensive. Sometimes, > {{setNeedsRescan}} is called multiple times, for example, in > FSNamesystem#modifyCacheDirective, there are 3 times. In monitor thread of > CacheReplicationMonitor, if it checks {{needsRescan}} is true, rescan will > happen, but {{needsRescan}} is set to false before real scan. Meanwhile, the > 2nd or 3rd time {{setNeedsResacn}} may set {{needsRescan}} to true. So after > the scan finish, in next loop, a new rescan will be triggered, that's not > necessary at all and inefficient for rescan twice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081502#comment-14081502 ] Colin Patrick McCabe commented on HDFS-6783: Hi Yi. I didn't like v2 of the patch since I felt like 2 and 1 were magic numbers, not obvious when reading the code. I also feel like having all these flags and conditions and suchlike is kind of brittle. I would rather just have three numbers: the number of the last completed scan, the number of the scan in progress, and the number of the next scan that has been requested. The nice thing about this approach is that if we call {{setNeedsRescan}} multiple times in a row during the same scan, it just keeps setting {{neededScanCount}} to the same value. This also doesn't make any assumptions about whether we hold the FSN lock for the entire duration of {{CacheReplicationMonitor#rescan}}. I posted a v3 of the patch that implements this. I think this also solves the issue in HDFS-6784. I apologize for posting my patch on your JIRA but I really felt like this was an awesome solution. Let me know what you think! > Fix HDFS CacheReplicationMonitor rescan logic > - > > Key: HDFS-6783 > URL: https://issues.apache.org/jira/browse/HDFS-6783 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 3.0.0 >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, > HDFS-6783.003.patch > > > In monitor thread, needsRescan is set to false before real scan starts, so > for {{waitForRescanIfNeeded}} will return for the first condition: > {code} > if (!needsRescan) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4901) Site Scripting and Phishing Through Frames in browseDirectory.jsp
[ https://issues.apache.org/jira/browse/HDFS-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081493#comment-14081493 ] Allen Wittenauer commented on HDFS-4901: At this point, there is unlikely to be another 1.x release, given the last one will be a year ago tomorrow. > Site Scripting and Phishing Through Frames in browseDirectory.jsp > - > > Key: HDFS-4901 > URL: https://issues.apache.org/jira/browse/HDFS-4901 > Project: Hadoop HDFS > Issue Type: Bug > Components: security, webhdfs >Affects Versions: 1.2.1 >Reporter: Jeffrey E Rodriguez >Assignee: Vivek Ganesan >Priority: Blocker > Attachments: HDFS-4901.patch, HDFS-4901.patch.1, > HDFS-4901_branch-1.2.patch > > Original Estimate: 24h > Time Spent: 24h > Remaining Estimate: 0h > > It is possible to steal or manipulate customer session and cookies, which > might be used to impersonate a legitimate user, > allowing the hacker to view or alter user records, and to perform > transactions as that user. > e.g. > GET /browseDirectory.jsp? dir=%2Fhadoop'"/>alert(759) > &namenodeInfoPort=50070 > Also; > Phishing Through Frames > Try: > GET /browseDirectory.jsp? > dir=%2Fhadoop%27%22%3E%3Ciframe+src%3Dhttp%3A%2F%2Fdemo.testfire.net%2Fphishing.html%3E > &namenodeInfoPort=50070 HTTP/1.1 > Cookie: JSESSIONID=qd9i8tuccuam1cme71swr9nfi > Accept-Language: en-US > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4916) DataTransfer may mask the IOException during block transfering
[ https://issues.apache.org/jira/browse/HDFS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-4916: --- Priority: Critical (was: Blocker) > DataTransfer may mask the IOException during block transfering > -- > > Key: HDFS-4916 > URL: https://issues.apache.org/jira/browse/HDFS-4916 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.0.4-alpha, 2.0.5-alpha >Reporter: Zesheng Wu >Priority: Critical > Attachments: 4916.v0.patch > > > When a new datanode is added to the pipeline, the client will trigger the > block transfer process. In the current implementation, the src datanode calls > the run() method of the DataTransfer to transfer the block, this method will > mask the IOExceptions during the transfering, and will make the client not > realize the failure during the transferring, as a result the client will > mistake the failing transferring as successful one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5185) DN fails to startup if one of the data dir is full
[ https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-5185: --- Priority: Critical (was: Blocker) > DN fails to startup if one of the data dir is full > -- > > Key: HDFS-5185 > URL: https://issues.apache.org/jira/browse/HDFS-5185 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Attachments: HDFS-5185.patch > > > DataNode fails to startup if one of the data dirs configured is out of space. > fails with following exception > {noformat}2013-09-11 17:48:43,680 FATAL > org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for > block pool Block pool (storage id > DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110 > java.io.IOException: Mkdirs failed to create > /opt/nish/data/current/BP-123456-1234567/tmp > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:105) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660) > at java.lang.Thread.run(Thread.java:662) > {noformat} > It should continue to start-up with other data dirs available. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6783: --- Attachment: HDFS-6783.003.patch > Fix HDFS CacheReplicationMonitor rescan logic > - > > Key: HDFS-6783 > URL: https://issues.apache.org/jira/browse/HDFS-6783 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 3.0.0 >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, > HDFS-6783.003.patch > > > In monitor thread, needsRescan is set to false before real scan starts, so > for {{waitForRescanIfNeeded}} will return for the first condition: > {code} > if (!needsRescan) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)