[jira] [Commented] (HDFS-7523) Setting a socket receive buffer size in DFSClient

2014-12-30 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260901#comment-14260901
 ] 

Liang Xie commented on HDFS-7523:
-

let me report number once get change, and i think we also need to make the 
buffer size configurable. e.g. we had hit the tcp incast issue in one of our 
hadoop cluster, and the rack switch buffer is overflowed,  in this case, i 
think it would be great to have a small enough buffer size to relieve that 
network issue:)

> Setting a socket receive buffer size in DFSClient
> -
>
> Key: HDFS-7523
> URL: https://issues.apache.org/jira/browse/HDFS-7523
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-7523-001 (1).txt, HDFS-7523-001 (1).txt, 
> HDFS-7523-001.txt, HDFS-7523-001.txt, HDFS-7523-001.txt
>
>
> It would be nice if we have a socket receive buffer size while creating 
> socket from client(HBase) view, in old version it should be in 
> DFSInputStream, in trunk it seems should be at:
> {code}
>   @Override // RemotePeerFactory
>   public Peer newConnectedPeer(InetSocketAddress addr,
>   Token blockToken, DatanodeID datanodeId)
>   throws IOException {
> Peer peer = null;
> boolean success = false;
> Socket sock = null;
> try {
>   sock = socketFactory.createSocket();
>   NetUtils.connect(sock, addr,
> getRandomLocalInterfaceAddr(),
> dfsClientConf.socketTimeout);
>   peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this,
>   blockToken, datanodeId);
>   peer.setReadTimeout(dfsClientConf.socketTimeout);
> {code}
> e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
> the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, 
> this value sometimes is small for HBase 64k block reading in a 10G network(at 
> least, more system call)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6633) Support reading new data in a being written file until the file is closed

2014-12-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260906#comment-14260906
 ] 

Hadoop QA commented on HDFS-6633:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689437/HDFS-6633-003.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
  org.apache.hadoop.hdfs.TestLeaseRecovery2

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestReadWhileWriting

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9129//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9129//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9129//console

This message is automatically generated.

> Support reading new data in a being written file until the file is closed
> -
>
> Key: HDFS-6633
> URL: https://issues.apache.org/jira/browse/HDFS-6633
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Vinayakumar B
> Attachments: HDFS-6633-001.patch, HDFS-6633-002.patch, 
> HDFS-6633-003.patch, h6633_20140707.patch, h6633_20140708.patch
>
>
> When a file is being written, the file length keeps increasing.  If the file 
> is opened for read, the reader first gets the file length and then read only 
> up to that length.  The reader will not be able to read the new data written 
> afterward.
> We propose adding a new feature so that readers will be able to read all the 
> data until the writer closes the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6633) Support reading new data in a being written file until the file is closed

2014-12-30 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260933#comment-14260933
 ] 

Vinayakumar B commented on HDFS-6633:
-

Tests {{TestDatanodeManager}} and {{TestLeaseRecovery2}} seems unrelated to my 
patch as patch works only when the new client API is used.

{{TestReadWhileWriting}} passed locally for me without timeout. 
Triggered the jenkins again to confirm.

> Support reading new data in a being written file until the file is closed
> -
>
> Key: HDFS-6633
> URL: https://issues.apache.org/jira/browse/HDFS-6633
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Vinayakumar B
> Attachments: HDFS-6633-001.patch, HDFS-6633-002.patch, 
> HDFS-6633-003.patch, h6633_20140707.patch, h6633_20140708.patch
>
>
> When a file is being written, the file length keeps increasing.  If the file 
> is opened for read, the reader first gets the file length and then read only 
> up to that length.  The reader will not be able to read the new data written 
> afterward.
> We propose adding a new feature so that readers will be able to read all the 
> data until the writer closes the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2014-12-30 Thread Lars Francke (JIRA)
Lars Francke created HDFS-7575:
--

 Summary: NameNode not handling heartbeats properly after HDFS-2832
 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke


Before HDFS-2832 each DataNode would have a unique storageId which included its 
IP address. Since HDFS-2832 the DataNodes have a unique storageId per storage 
directory which is just a random UUID.

They send reports per storage directory in their heartbeats. This heartbeat is 
processed on the NameNode in the {{DatanodeDescriptor#updateHeartbeatState}} 
method. Pre HDFS-2832 this would just store the information per Datanode. After 
the patch though each DataNode can have multiple different storages so it's 
stored in a map keyed by the storage Id.

This works fine for all clusters that have been installed post HDFS-2832 as 
they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
different keys. On each Heartbeat the Map is searched and updated 
({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):

{code:title=DatanodeStorageInfo}
  void updateState(StorageReport r) {
capacity = r.getCapacity();
dfsUsed = r.getDfsUsed();
remaining = r.getRemaining();
blockPoolUsed = r.getBlockPoolUsed();
  }
{code}

On clusters that were upgraded from a pre HDFS-2832 version though the storage 
Id has not been rewritten (at least not on the four clusters I checked) so each 
directory will have the exact same storageId. That means there'll be only a 
single entry in the {{storageMap}} and it'll be overwritten by a random 
{{StorageReport}} from the DataNode. This can be seen in the {{updateState}} 
method above. This just assigns the capacity from the received report, instead 
it should probably sum it up per received heartbeat.

The Balancer seems to be one of the only things that actually uses this 
information so it now considers the utilization of a random drive per DataNode 
for balancing purposes.

Things get even worse when a drive has been added or replaced as this will now 
get a new storage Id so there'll be two entries in the storageMap. As new 
drives are usually empty it skewes the balancers decision in a way that this 
node will never be considered over-utilized.

Another problem is that old StorageReports are never removed from the 
storageMap. So if I replace a drive and it gets a new storage Id the old one 
will still be in place and used for all calculations by the Balancer until a 
restart of the NameNode.

I can try providing a patch that does the following:

* Instead of using a Map I could just store the array we receive or instead of 
storing an array sum up the values for reports with the same Id
* On each heartbeat clear the map (so we know we have up to date information)

Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2014-12-30 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261055#comment-14261055
 ] 

Lars Francke commented on HDFS-7575:


I worked around this by doing the following for each DataNode:
* Stop the DataNode
* Change the storageId in each storage directory (it's in the VERSION file, 
e.g. {{/mnt/disk1/dfs/dn/current/VERSION}}) to a unique value
* Start the DataNode

Then afterwards I restarted the Standby NN (NN2), failed over manually, 
restarted the new Standby NN (NN1).

The Balancer seems to run fine since then.

> NameNode not handling heartbeats properly after HDFS-2832
> -
>
> Key: HDFS-7575
> URL: https://issues.apache.org/jira/browse/HDFS-7575
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Francke
>
> Before HDFS-2832 each DataNode would have a unique storageId which included 
> its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
> storage directory which is just a random UUID.
> They send reports per storage directory in their heartbeats. This heartbeat 
> is processed on the NameNode in the 
> {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
> just store the information per Datanode. After the patch though each DataNode 
> can have multiple different storages so it's stored in a map keyed by the 
> storage Id.
> This works fine for all clusters that have been installed post HDFS-2832 as 
> they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
> different keys. On each Heartbeat the Map is searched and updated 
> ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
> {code:title=DatanodeStorageInfo}
>   void updateState(StorageReport r) {
> capacity = r.getCapacity();
> dfsUsed = r.getDfsUsed();
> remaining = r.getRemaining();
> blockPoolUsed = r.getBlockPoolUsed();
>   }
> {code}
> On clusters that were upgraded from a pre HDFS-2832 version though the 
> storage Id has not been rewritten (at least not on the four clusters I 
> checked) so each directory will have the exact same storageId. That means 
> there'll be only a single entry in the {{storageMap}} and it'll be 
> overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
> in the {{updateState}} method above. This just assigns the capacity from the 
> received report, instead it should probably sum it up per received heartbeat.
> The Balancer seems to be one of the only things that actually uses this 
> information so it now considers the utilization of a random drive per 
> DataNode for balancing purposes.
> Things get even worse when a drive has been added or replaced as this will 
> now get a new storage Id so there'll be two entries in the storageMap. As new 
> drives are usually empty it skewes the balancers decision in a way that this 
> node will never be considered over-utilized.
> Another problem is that old StorageReports are never removed from the 
> storageMap. So if I replace a drive and it gets a new storage Id the old one 
> will still be in place and used for all calculations by the Balancer until a 
> restart of the NameNode.
> I can try providing a patch that does the following:
> * Instead of using a Map I could just store the array we receive or instead 
> of storing an array sum up the values for reports with the same Id
> * On each heartbeat clear the map (so we know we have up to date information)
> Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6633) Support reading new data in a being written file until the file is closed

2014-12-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261059#comment-14261059
 ] 

Hadoop QA commented on HDFS-6633:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689437/HDFS-6633-003.patch
  against trunk revision 249cc90.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9130//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9130//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9130//console

This message is automatically generated.

> Support reading new data in a being written file until the file is closed
> -
>
> Key: HDFS-6633
> URL: https://issues.apache.org/jira/browse/HDFS-6633
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Vinayakumar B
> Attachments: HDFS-6633-001.patch, HDFS-6633-002.patch, 
> HDFS-6633-003.patch, h6633_20140707.patch, h6633_20140708.patch
>
>
> When a file is being written, the file length keeps increasing.  If the file 
> is opened for read, the reader first gets the file length and then read only 
> up to that length.  The reader will not be able to read the new data written 
> afterward.
> We propose adding a new feature so that readers will be able to read all the 
> data until the writer closes the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7576) TestPipelinesFailover#testFailoverRightBeforeCommitSynchronization sometimes fails in Java 8 build

2014-12-30 Thread Ted Yu (JIRA)
Ted Yu created HDFS-7576:


 Summary: 
TestPipelinesFailover#testFailoverRightBeforeCommitSynchronization sometimes 
fails in Java 8 build
 Key: HDFS-7576
 URL: https://issues.apache.org/jira/browse/HDFS-7576
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/54/ :
{code}
REGRESSION:  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization

Error Message:
test timed out after 3 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 3 milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at 
org.apache.hadoop.test.GenericTestUtils$DelayAnswer.waitForCall(GenericTestUtils.java:226)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization(TestPipelinesFailover.java:386)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7083) TestDecommission#testIncludeByRegistrationName sometimes fails

2014-12-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261169#comment-14261169
 ] 

Ted Yu commented on HDFS-7083:
--

Failed in https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/54/ as well.

> TestDecommission#testIncludeByRegistrationName sometimes fails
> --
>
> Key: HDFS-7083
> URL: https://issues.apache.org/jira/browse/HDFS-7083
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>
> From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1874/ :
> {code}
> REGRESSION:  
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
> Error Message:
> test timed out after 36 milliseconds
> Stack Trace:
> java.lang.Exception: test timed out after 36 milliseconds
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7574) Make cmake work in Windows Visual Studio 2010

2014-12-30 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-7574:
---
Attachment: HDFS-7574-branch-HDFS-6994-0.patch

Attached is a simple patch that allows cmake to generate a solution file in 
Visual Studio 2010. 

> Make cmake work in Windows Visual Studio 2010
> -
>
> Key: HDFS-7574
> URL: https://issues.apache.org/jira/browse/HDFS-7574
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows Visual Studio 2010
>Reporter: Thanh Do
>Assignee: Thanh Do
> Attachments: HDFS-7574-branch-HDFS-6994-0.patch
>
>
> Cmake should be able to generate a solution file in Windows Visual Studio 
> 2010. This is the first step in a series of steps making libhdfs3 built 
> successfully in Windows. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7577) Add additional headers that includes need by Windows

2014-12-30 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7577:
--

 Summary: Add additional headers that includes need by Windows
 Key: HDFS-7577
 URL: https://issues.apache.org/jira/browse/HDFS-7577
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Thanh Do
Assignee: Thanh Do


This jira involves adding a list of (mostly dummy) headers that available in 
POSIX systems, but not in Windows. One step towards making libhdfs3 built in 
Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7574) Make cmake work in Windows Visual Studio 2010

2014-12-30 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261233#comment-14261233
 ] 

Thanh Do commented on HDFS-7574:


Could somebody please review this patch? Once it gets in, I can start 
submitting subsequent patches for issues such as HDFS-7577. Thanks!

> Make cmake work in Windows Visual Studio 2010
> -
>
> Key: HDFS-7574
> URL: https://issues.apache.org/jira/browse/HDFS-7574
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows Visual Studio 2010
>Reporter: Thanh Do
>Assignee: Thanh Do
> Attachments: HDFS-7574-branch-HDFS-6994-0.patch
>
>
> Cmake should be able to generate a solution file in Windows Visual Studio 
> 2010. This is the first step in a series of steps making libhdfs3 built 
> successfully in Windows. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7574) Make cmake work in Windows Visual Studio 2010

2014-12-30 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-7574:
---
Attachment: HDFS-7574-branch-HDFS-6994-1.patch

> Make cmake work in Windows Visual Studio 2010
> -
>
> Key: HDFS-7574
> URL: https://issues.apache.org/jira/browse/HDFS-7574
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows Visual Studio 2010
>Reporter: Thanh Do
>Assignee: Thanh Do
> Attachments: HDFS-7574-branch-HDFS-6994-1.patch
>
>
> Cmake should be able to generate a solution file in Windows Visual Studio 
> 2010. This is the first step in a series of steps making libhdfs3 built 
> successfully in Windows. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7574) Make cmake work in Windows Visual Studio 2010

2014-12-30 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-7574:
---
Attachment: (was: HDFS-7574-branch-HDFS-6994-0.patch)

> Make cmake work in Windows Visual Studio 2010
> -
>
> Key: HDFS-7574
> URL: https://issues.apache.org/jira/browse/HDFS-7574
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows Visual Studio 2010
>Reporter: Thanh Do
>Assignee: Thanh Do
> Attachments: HDFS-7574-branch-HDFS-6994-1.patch
>
>
> Cmake should be able to generate a solution file in Windows Visual Studio 
> 2010. This is the first step in a series of steps making libhdfs3 built 
> successfully in Windows. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7523) Setting a socket receive buffer size in DFSClient

2014-12-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261299#comment-14261299
 ] 

stack commented on HDFS-7523:
-

[~xieliang007]

bq. let me report number once get change

Before I commit here?

bq. ...and i think we also need to make the buffer size configurable.

Makes sense.  You want to do this in new issue?

> Setting a socket receive buffer size in DFSClient
> -
>
> Key: HDFS-7523
> URL: https://issues.apache.org/jira/browse/HDFS-7523
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-7523-001 (1).txt, HDFS-7523-001 (1).txt, 
> HDFS-7523-001.txt, HDFS-7523-001.txt, HDFS-7523-001.txt
>
>
> It would be nice if we have a socket receive buffer size while creating 
> socket from client(HBase) view, in old version it should be in 
> DFSInputStream, in trunk it seems should be at:
> {code}
>   @Override // RemotePeerFactory
>   public Peer newConnectedPeer(InetSocketAddress addr,
>   Token blockToken, DatanodeID datanodeId)
>   throws IOException {
> Peer peer = null;
> boolean success = false;
> Socket sock = null;
> try {
>   sock = socketFactory.createSocket();
>   NetUtils.connect(sock, addr,
> getRandomLocalInterfaceAddr(),
> dfsClientConf.socketTimeout);
>   peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this,
>   blockToken, datanodeId);
>   peer.setReadTimeout(dfsClientConf.socketTimeout);
> {code}
> e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
> the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, 
> this value sometimes is small for HBase 64k block reading in a 10G network(at 
> least, more system call)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7270) Implementing congestion control in writing pipeline

2014-12-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261358#comment-14261358
 ] 

Arpit Agarwal commented on HDFS-7270:
-

Can we rename the Jira to something like _"Add congestion signaling capability 
to DataNode write protocol"_ since this change does not implement congestion 
control.

The approach looks fine. Minor comments:
# StatusFormat values can take StatusFormat.BITS in constructor instead of 
StatusFormat. Then you don't need the null check. See 
{{INodeFile.HeaderFormat}}.
# Why do we need the SUPPORTED2 value?
# Add a comment to enum {{Status}} in DataTransfer.proto that it is a 4-bit 
value? It is close to full so if someone adds a few more status codes it can 
overflow.
# Would you add a short Javadoc for DataNode.getECN since it will need to be 
extended in the next step?

Can we add test cases to ensure ECN.SUPPORTED and ECN.CONGESTED are propagated?

> Implementing congestion control in writing pipeline
> ---
>
> Key: HDFS-7270
> URL: https://issues.apache.org/jira/browse/HDFS-7270
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7270.000.patch, HDFS-7270.001.patch
>
>
> When a client writes to HDFS faster than the disk bandwidth of the DNs, it  
> saturates the disk bandwidth and put the DNs unresponsive. The client only 
> backs off by aborting / recovering the pipeline, which leads to failed writes 
> and unnecessary pipeline recovery.
> This jira proposes to add explicit congestion control mechanisms in the 
> writing pipeline. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7578:
-
Attachment: HDFS-7578.001.patch

> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch
>
>
> Write and Commit responses directly write data to the channel instead of push 
> it to the process pipeline. This could cause the NFS handler thread be 
> blocked waiting for the response to be flushed to the network before it can 
> return to serve a different request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Brandon Li (JIRA)
Brandon Li created HDFS-7578:


 Summary: NFS WRITE and COMMIT responses should always use the 
channel pipeline
 Key: HDFS-7578
 URL: https://issues.apache.org/jira/browse/HDFS-7578
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7578.001.patch

Write and Commit responses directly write data to the channel instead of push 
it to the process pipeline. This could cause the NFS handler thread be blocked 
waiting for the response to be flushed to the network before it can return to 
serve a different request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7578:
-
Status: Patch Available  (was: Open)

> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch
>
>
> Write and Commit responses directly write data to the channel instead of push 
> it to the process pipeline. This could cause the NFS handler thread be 
> blocked waiting for the response to be flushed to the network before it can 
> return to serve a different request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261467#comment-14261467
 ] 

Hadoop QA commented on HDFS-7578:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689549/HDFS-7578.001.patch
  against trunk revision 6621c35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9131//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9131//console

This message is automatically generated.

> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch
>
>
> Write and Commit responses directly write data to the channel instead of push 
> it to the process pipeline. This could cause the NFS handler thread be 
> blocked waiting for the response to be flushed to the network before it can 
> return to serve a different request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7578:
-
Description: Write and Commit responses directly write data to the channel 
instead of pushing it to the channel pipeline. This could block the NFS handler 
thread waiting for the response to be flushed to the network before it can 
return to serve a different request.  (was: Write and Commit responses directly 
write data to the channel instead of push it to the process pipeline. This 
could cause the NFS handler thread be blocked waiting for the response to be 
flushed to the network before it can return to serve a different request.)

> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch
>
>
> Write and Commit responses directly write data to the channel instead of 
> pushing it to the channel pipeline. This could block the NFS handler thread 
> waiting for the response to be flushed to the network before it can return to 
> serve a different request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261485#comment-14261485
 ] 

Charles Lamb commented on HDFS-7578:


Hi [~brandonli],

Thanks for posting this patch. It looks pretty good. Did you see this problem 
in the field or did you notice it through code examination? Is it possible to 
create an actual test case the demonstrates the problem?

I have a few relatively minor comments.

In Nfs3Utils.java, you could change this:

{code}
XDR out = new XDR();
out = response.serialize(out, xid, new VerifierNone());
{code}

to

{code}
final XDR out = response.serialize(new XDR(), xid, new VerifierNone());
{code}

Also, the code in #writeChannelCommit and #writeChannel are pretty much 
identical except for the call to LOG.debug. You might want to refactor that a 
little. (also, s/Commit done:/Commit done: /)

Thanks.
Charles



> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch
>
>
> Write and Commit responses directly write data to the channel instead of 
> pushing it to the channel pipeline. This could block the NFS handler thread 
> waiting for the response to be flushed to the network before it can return to 
> serve a different request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes

2014-12-30 Thread Joe Pallas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Pallas updated HDFS-5782:
-
Attachment: HDFS-5782.patch

Revised patch

> BlockListAsLongs should take lists of Replicas rather than concrete classes
> ---
>
> Key: HDFS-5782
> URL: https://issues.apache.org/jira/browse/HDFS-5782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5782.patch, HDFS-5782.patch
>
>
> From HDFS-5194:
> {quote}
> BlockListAsLongs's constructor takes a list of Blocks and a list of 
> ReplicaInfos.  On the surface, the former is mildly irritating because it is 
> a concrete class, while the latter is a greater concern due to being a 
> File-based implementation of Replica.
> On deeper inspection, BlockListAsLongs passes members of both to an internal 
> method that accepts just Blocks, which conditionally casts them *back* to 
> ReplicaInfos (this cast only happens to the latter, though this isn't 
> immediately obvious to the reader).
> Conveniently, all methods called on these objects are found in the Replica 
> interface, and all functional (i.e. non-test) consumers of this interface 
> pass in Replica subclasses.  If this constructor took Lists of Replicas 
> instead, it would be more generally useful and its implementation would be 
> cleaner as well.
> {quote}
> Fixing this indeed makes the business end of BlockListAsLongs cleaner while 
> requiring no changes to FsDatasetImpl.  As suggested by the above 
> description, though, the HDFS tests use BlockListAsLongs differently from the 
> production code -- they pretty much universally provide a list of actual 
> Blocks.  To handle this:
> - In the case of SimulatedFSDataset, providing a list of Replicas is actually 
> less work.
> - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly 
> invasive.  Instead, the patch creates a second constructor in 
> BlockListOfLongs specifically for the use of NNThrougputBenchmark.  It turns 
> the stomach a little, but is clearer and requires less code than the 
> alternatives (and isn't without precedent).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2014-12-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261504#comment-14261504
 ] 

Colin Patrick McCabe commented on HDFS-7575:


This seems like an upgrade problem.  Each directory should have its own storage 
id.  It seems like we should fix the upgrade code to make sure that this is the 
case.  If necessary, that means we should generate new codes for some 
directories.

> NameNode not handling heartbeats properly after HDFS-2832
> -
>
> Key: HDFS-7575
> URL: https://issues.apache.org/jira/browse/HDFS-7575
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Francke
>
> Before HDFS-2832 each DataNode would have a unique storageId which included 
> its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
> storage directory which is just a random UUID.
> They send reports per storage directory in their heartbeats. This heartbeat 
> is processed on the NameNode in the 
> {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
> just store the information per Datanode. After the patch though each DataNode 
> can have multiple different storages so it's stored in a map keyed by the 
> storage Id.
> This works fine for all clusters that have been installed post HDFS-2832 as 
> they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
> different keys. On each Heartbeat the Map is searched and updated 
> ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
> {code:title=DatanodeStorageInfo}
>   void updateState(StorageReport r) {
> capacity = r.getCapacity();
> dfsUsed = r.getDfsUsed();
> remaining = r.getRemaining();
> blockPoolUsed = r.getBlockPoolUsed();
>   }
> {code}
> On clusters that were upgraded from a pre HDFS-2832 version though the 
> storage Id has not been rewritten (at least not on the four clusters I 
> checked) so each directory will have the exact same storageId. That means 
> there'll be only a single entry in the {{storageMap}} and it'll be 
> overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
> in the {{updateState}} method above. This just assigns the capacity from the 
> received report, instead it should probably sum it up per received heartbeat.
> The Balancer seems to be one of the only things that actually uses this 
> information so it now considers the utilization of a random drive per 
> DataNode for balancing purposes.
> Things get even worse when a drive has been added or replaced as this will 
> now get a new storage Id so there'll be two entries in the storageMap. As new 
> drives are usually empty it skewes the balancers decision in a way that this 
> node will never be considered over-utilized.
> Another problem is that old StorageReports are never removed from the 
> storageMap. So if I replace a drive and it gets a new storage Id the old one 
> will still be in place and used for all calculations by the Balancer until a 
> restart of the NameNode.
> I can try providing a patch that does the following:
> * Instead of using a Map I could just store the array we receive or instead 
> of storing an array sum up the values for reports with the same Id
> * On each heartbeat clear the map (so we know we have up to date information)
> Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7188) support build libhdfs3 on windows

2014-12-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261509#comment-14261509
 ] 

Colin Patrick McCabe commented on HDFS-7188:


bq. Regarding mman library, I think the code is MIT licence, but it doesn't 
hurt to rewrite this.

Right, but we need to have that documented by the author in order to use this 
code.  Or we could just rewrite this functionality, as you pointed out.

bq. Now, I am convinced that we should break this into small jiras. Few I could 
think of.

That sounds good, but it seems like #3 and #4 need to come before #2.

thanks

> support build libhdfs3 on windows
> -
>
> Key: HDFS-7188
> URL: https://issues.apache.org/jira/browse/HDFS-7188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows System, Visual Studio 2010
>Reporter: Zhanwei Wang
>Assignee: Thanh Do
> Attachments: HDFS-7188-branch-HDFS-6994-0.patch, 
> HDFS-7188-branch-HDFS-6994-1.patch
>
>
> libhdfs3 should work on windows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-12-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261519#comment-14261519
 ] 

Colin Patrick McCabe commented on HDFS-6994:


Hi [~decster], that is an interesting idea.  I feel a bit confused about how it 
would save effort, though.  It seems like you're just reimplementing JNI, 
right?  The hassles in JNI are converting between Java and C types, and dealing 
with method signatures and exceptions.  But you have all those same hassles 
when you are making RPCs via JSON.  Plus you have the additional hassle of 
writing a JSON interface for any functionality you want to call, whereas JNI 
can just call any Java function without modifying the Java code.

I guess one nice thing is that you wouldn't have to deal with the Java server 
being attached to the same process as your test process.  But you could get 
that same benefit by calling fork() before calling the JNI functions.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7188) support build libhdfs3 on windows

2014-12-30 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261555#comment-14261555
 ] 

Thanh Do commented on HDFS-7188:


Hi [~cmccabe]. Let me clarify about the small jiras that I've thought of (their 
order, what they do, and what is the outcome). 

The overall goal is that once all of these JIRA get in, libhdfs3 can be built 
and run successfully in Windows Visual Studio. However, each individual JIRA 
will not guarantee this big goal. Rather, it serves as a step toward the 
overall goal. One requirement for each JIRA is that it should not break build 
in Linux or Mac.

With this overview, here is the list of the proposed JIRAs.

# make cmake generate a solution file for VS 2010 (HDFS-7574). This JIRA only 
contains changes for Cmakelist files. _Outcome_: running "cmake -G Visual 
Studio 10 2010" will generate a solution file, loadable by VS 2010. Of course, 
build in Windows will not be successful.
# add additional headers needed by Windows (HDFS-7577). This JIRA contains two 
set of changes: (a) missing dummy headers file in Windows, and (b) cmake 
changes to add header dirs. _Outcome_: build in Windows will still fail (with 
smaller number of errors though, because now the missing headers are there).
# restructure the platform specific functions. The goal here is to make POSIX 
specific code (e.g., in logging, stack printer, get local network address) 
platform aware. Some examples would be {{platform_vsnprintf}} and 
{{GetAdaptersAddresses}}, as you mentioned above. _Outcome_: build will success 
in Windows, but the library will not function correctly, because Windows 
counterpart  are only placeholders.
# Implement platform specific functions in Windows. This JIRA simply fills in 
those placeholders in #3 with the large chunks of Windows-specific code I 
already have. _Outcome_: libhdfs3 now can be built successfully _and_ run 
correctly.

Please let me know your thoughts.


> support build libhdfs3 on windows
> -
>
> Key: HDFS-7188
> URL: https://issues.apache.org/jira/browse/HDFS-7188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows System, Visual Studio 2010
>Reporter: Zhanwei Wang
>Assignee: Thanh Do
> Attachments: HDFS-7188-branch-HDFS-6994-0.patch, 
> HDFS-7188-branch-HDFS-6994-1.patch
>
>
> libhdfs3 should work on windows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows

2014-12-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261554#comment-14261554
 ] 

Arpit Agarwal commented on HDFS-7572:
-

Thanks for taking a look [~xyao]. Waiting for a committer +1 to commit.

> TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
> ---
>
> Key: HDFS-7572
> URL: https://issues.apache.org/jira/browse/HDFS-7572
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-7572.001.patch
>
>
> *Error Message*
> Expected: is 
>  but: was 
> *Stacktrace*
> java.lang.AssertionError: 
> Expected: is 
>  but: was 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7572) TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows

2014-12-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261554#comment-14261554
 ] 

Arpit Agarwal edited comment on HDFS-7572 at 12/30/14 10:09 PM:


Thanks for reviewing it [~xyao]. Waiting for a committer +1 to commit.


was (Author: arpitagarwal):
Thanks for taking a look [~xyao]. Waiting for a committer +1 to commit.

> TestLazyPersistFiles#testDnRestartWithSavedReplicas is flaky on Windows
> ---
>
> Key: HDFS-7572
> URL: https://issues.apache.org/jira/browse/HDFS-7572
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-7572.001.patch
>
>
> *Error Message*
> Expected: is 
>  but: was 
> *Stacktrace*
> java.lang.AssertionError: 
> Expected: is 
>  but: was 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:129)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles.testDnRestartWithSavedReplicas(TestLazyPersistFiles.java:668)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7188) support build libhdfs3 on windows

2014-12-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261562#comment-14261562
 ] 

Colin Patrick McCabe commented on HDFS-7188:


OK, I misunderstood.  I thought you were proposing adding all the 
platform-specific code in #ifdefs up front, and then moving it into platform 
files later.  But actually you are proposing doing the restructuring a little 
more gradually.  I think your proposal makes sense.

> support build libhdfs3 on windows
> -
>
> Key: HDFS-7188
> URL: https://issues.apache.org/jira/browse/HDFS-7188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows System, Visual Studio 2010
>Reporter: Zhanwei Wang
>Assignee: Thanh Do
> Attachments: HDFS-7188-branch-HDFS-6994-0.patch, 
> HDFS-7188-branch-HDFS-6994-1.patch
>
>
> libhdfs3 should work on windows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7188) support build libhdfs3 on windows

2014-12-30 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261569#comment-14261569
 ] 

Thanh Do commented on HDFS-7188:


Hi [~cmccabe]. Glad that we are on the same pages :). Could you review 
HDFS-7574? It is the JIRA #1. Once that gets in, I can submit patches for 
subsequent JIRAs. 

> support build libhdfs3 on windows
> -
>
> Key: HDFS-7188
> URL: https://issues.apache.org/jira/browse/HDFS-7188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Windows System, Visual Studio 2010
>Reporter: Zhanwei Wang
>Assignee: Thanh Do
> Attachments: HDFS-7188-branch-HDFS-6994-0.patch, 
> HDFS-7188-branch-HDFS-6994-1.patch
>
>
> libhdfs3 should work on windows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7578:
-
Attachment: HDFS-7578.002.patch

> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch, HDFS-7578.002.patch
>
>
> Write and Commit responses directly write data to the channel instead of 
> propagating it to the next immediate handler in the channel pipeline. 
> Not following Netty channel pipeline model could be problematic. We don't 
> know whether it could cause any resource leak or performance issue especially 
> the internal pipeline implementation keeps changing with newer Netty releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7578:
-
Description: 
Write and Commit responses directly write data to the channel instead of 
propagating it to the next immediate handler in the channel pipeline. 
Not following Netty channel pipeline model could be problematic. We don't know 
whether it could cause any resource leak or performance issue especially the 
internal pipeline implementation keeps changing with newer Netty releases.

  was:Write and Commit responses directly write data to the channel instead of 
pushing it to the channel pipeline. This could block the NFS handler thread 
waiting for the response to be flushed to the network before it can return to 
serve a different request.


> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch, HDFS-7578.002.patch
>
>
> Write and Commit responses directly write data to the channel instead of 
> propagating it to the next immediate handler in the channel pipeline. 
> Not following Netty channel pipeline model could be problematic. We don't 
> know whether it could cause any resource leak or performance issue especially 
> the internal pipeline implementation keeps changing with newer Netty releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261588#comment-14261588
 ] 

Brandon Li commented on HDFS-7578:
--

Thank you, [~clamb]. 
I've updated the bug description to be more accurate and uploaded a new patch 
to address your comments.

> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch, HDFS-7578.002.patch
>
>
> Write and Commit responses directly write data to the channel instead of 
> propagating it to the next immediate handler in the channel pipeline. 
> Not following Netty channel pipeline model could be problematic. We don't 
> know whether it could cause any resource leak or performance issue especially 
> the internal pipeline implementation keeps changing with newer Netty releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261612#comment-14261612
 ] 

Charles Lamb commented on HDFS-7578:


LGTM. Non-binding +1 from me.

Thanks Brandon.


> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch, HDFS-7578.002.patch
>
>
> Write and Commit responses directly write data to the channel instead of 
> propagating it to the next immediate handler in the channel pipeline. 
> Not following Netty channel pipeline model could be problematic. We don't 
> know whether it could cause any resource leak or performance issue especially 
> the internal pipeline implementation keeps changing with newer Netty releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7578) NFS WRITE and COMMIT responses should always use the channel pipeline

2014-12-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261635#comment-14261635
 ] 

Hadoop QA commented on HDFS-7578:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689584/HDFS-7578.002.patch
  against trunk revision 6621c35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9133//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9133//console

This message is automatically generated.

> NFS WRITE and COMMIT responses should always use the channel pipeline
> -
>
> Key: HDFS-7578
> URL: https://issues.apache.org/jira/browse/HDFS-7578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-7578.001.patch, HDFS-7578.002.patch
>
>
> Write and Commit responses directly write data to the channel instead of 
> propagating it to the next immediate handler in the channel pipeline. 
> Not following Netty channel pipeline model could be problematic. We don't 
> know whether it could cause any resource leak or performance issue especially 
> the internal pipeline implementation keeps changing with newer Netty releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes

2014-12-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261712#comment-14261712
 ] 

Hadoop QA commented on HDFS-5782:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689565/HDFS-5782.patch
  against trunk revision 6621c35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9132//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9132//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9132//console

This message is automatically generated.

> BlockListAsLongs should take lists of Replicas rather than concrete classes
> ---
>
> Key: HDFS-5782
> URL: https://issues.apache.org/jira/browse/HDFS-5782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5782.patch, HDFS-5782.patch
>
>
> From HDFS-5194:
> {quote}
> BlockListAsLongs's constructor takes a list of Blocks and a list of 
> ReplicaInfos.  On the surface, the former is mildly irritating because it is 
> a concrete class, while the latter is a greater concern due to being a 
> File-based implementation of Replica.
> On deeper inspection, BlockListAsLongs passes members of both to an internal 
> method that accepts just Blocks, which conditionally casts them *back* to 
> ReplicaInfos (this cast only happens to the latter, though this isn't 
> immediately obvious to the reader).
> Conveniently, all methods called on these objects are found in the Replica 
> interface, and all functional (i.e. non-test) consumers of this interface 
> pass in Replica subclasses.  If this constructor took Lists of Replicas 
> instead, it would be more generally useful and its implementation would be 
> cleaner as well.
> {quote}
> Fixing this indeed makes the business end of BlockListAsLongs cleaner while 
> requiring no changes to FsDatasetImpl.  As suggested by the above 
> description, though, the HDFS tests use BlockListAsLongs differently from the 
> production code -- they pretty much universally provide a list of actual 
> Blocks.  To handle this:
> - In the case of SimulatedFSDataset, providing a list of Replicas is actually 
> less work.
> - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly 
> invasive.  Instead, the patch creates a second constructor in 
> BlockListOfLongs specifically for the use of NNThrougputBenchmark.  It turns 
> the stomach a little, but is clearer and requires less code than the 
> alternatives (and isn't without precedent).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes

2014-12-30 Thread Joe Pallas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261810#comment-14261810
 ] 

Joe Pallas commented on HDFS-5782:
--

Hmm, the report says Findbugs version 3.0.0 and cites a file that is not 
touched by the patch.

> BlockListAsLongs should take lists of Replicas rather than concrete classes
> ---
>
> Key: HDFS-5782
> URL: https://issues.apache.org/jira/browse/HDFS-5782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5782.patch, HDFS-5782.patch
>
>
> From HDFS-5194:
> {quote}
> BlockListAsLongs's constructor takes a list of Blocks and a list of 
> ReplicaInfos.  On the surface, the former is mildly irritating because it is 
> a concrete class, while the latter is a greater concern due to being a 
> File-based implementation of Replica.
> On deeper inspection, BlockListAsLongs passes members of both to an internal 
> method that accepts just Blocks, which conditionally casts them *back* to 
> ReplicaInfos (this cast only happens to the latter, though this isn't 
> immediately obvious to the reader).
> Conveniently, all methods called on these objects are found in the Replica 
> interface, and all functional (i.e. non-test) consumers of this interface 
> pass in Replica subclasses.  If this constructor took Lists of Replicas 
> instead, it would be more generally useful and its implementation would be 
> cleaner as well.
> {quote}
> Fixing this indeed makes the business end of BlockListAsLongs cleaner while 
> requiring no changes to FsDatasetImpl.  As suggested by the above 
> description, though, the HDFS tests use BlockListAsLongs differently from the 
> production code -- they pretty much universally provide a list of actual 
> Blocks.  To handle this:
> - In the case of SimulatedFSDataset, providing a list of Replicas is actually 
> less work.
> - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly 
> invasive.  Instead, the patch creates a second constructor in 
> BlockListOfLongs specifically for the use of NNThrougputBenchmark.  It turns 
> the stomach a little, but is clearer and requires less code than the 
> alternatives (and isn't without precedent).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-12-30 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261813#comment-14261813
 ] 

Binglin Chang commented on HDFS-6994:
-

The trick is using reflection and json/java type auto mapping to create a 
generic method, so when I write in CLI:
startDataNodes "{conf}" 3 true null ["rack0", "rack1"] [1,1] 
or
waitActive 1
or
stopDatanode 1
It will find the proper MiniDFSCluster method, automatically do type conversion 
of arguments and call the method.
By doing this, we can also start a minicluster and control its behavior 
manually, so it can also be used in manual debugging and testing.


> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-12-30 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261819#comment-14261819
 ] 

Binglin Chang commented on HDFS-6994:
-

This is more like a cli(or repl) rather than rpc. On native side, we can wrap 
the repl to rpc interface, but it only requires to serialize c++ arguments to 
json strings(using sprintf should be enough) I see most commonly used methods' 
arguments and return value are just simple primitive types. Methods with 
complex types are not likely to be used.


> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7496) Fix FsVolume removal race conditions on the DataNode

2014-12-30 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7496:

Attachment: HDFS-7496.001.patch

Thanks for the reviews, [~cmccabe]. They are very helpful. I have made changes 
accordingly, detailed as following:


bq. * Rather than having a boolean hasReference, let's have an actual pointer 
to the FsVolumeSpi object. 
bq. * We should release the reference count in close(), not in the finally block
bq. * We don't need this code any more:

Done

bq. This change seems unrelated to this JIRA... 

Yes, this is not related. I removed it from this patch. I should follow another 
JIRA to use {{File#canonicalPath}} to compare the volumes. But it should not 
throw {{IOE}} for {{File#getCanonicalPath()}}, as you mentioned above.

bq. How about using Preconditions.checkNonNull here... might look nicer

It is for simplifying the test code, i.e., it does not need to construct a 
{{Datanode}} object for {{FsVolumeImpl}} tests.

bq. What I was envisioning was having getNextVolume increment the reference 
count when it retrieved the volume, 

{{getNextVolume}} is not the only place to pass an _active_ volume. I have made 
the change that if BlockReceiver's constructor successes, it must hold a 
reference count from 
{{FsDatasetImpl#createRbw/createTemporary/append/recoverRbw/recoverAppend}}. 
Also I added a {{FsVolumeReference}} helper class to let the caller use 
{{try-with-resources}} on the reference count.


> Fix FsVolume removal race conditions on the DataNode 
> -
>
> Key: HDFS-7496
> URL: https://issues.apache.org/jira/browse/HDFS-7496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7496.000.patch, HDFS-7496.001.patch
>
>
> We discussed a few FsVolume removal race conditions on the DataNode in 
> HDFS-7489.  We should figure out a way to make removing an FsVolume safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

2014-12-30 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261964#comment-14261964
 ] 

Kai Zheng commented on HDFS-7337:
-

We're discussing offline thru the design and codes. When it's finished, let's 
see how we're aligned and I will update here then.

> Configurable and pluggable Erasure Codec and schema
> ---
>
> Key: HDFS-7337
> URL: https://issues.apache.org/jira/browse/HDFS-7337
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Kai Zheng
> Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7496) Fix FsVolume removal race conditions on the DataNode

2014-12-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261980#comment-14261980
 ] 

Hadoop QA commented on HDFS-7496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689628/HDFS-7496.001.patch
  against trunk revision e2351c7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9134//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9134//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9134//console

This message is automatically generated.

> Fix FsVolume removal race conditions on the DataNode 
> -
>
> Key: HDFS-7496
> URL: https://issues.apache.org/jira/browse/HDFS-7496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7496.000.patch, HDFS-7496.001.patch
>
>
> We discussed a few FsVolume removal race conditions on the DataNode in 
> HDFS-7489.  We should figure out a way to make removing an FsVolume safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7496) Fix FsVolume removal race conditions on the DataNode

2014-12-30 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261982#comment-14261982
 ] 

Lei (Eddy) Xu commented on HDFS-7496:
-

The findbugs message is not relevant. 

> Fix FsVolume removal race conditions on the DataNode 
> -
>
> Key: HDFS-7496
> URL: https://issues.apache.org/jira/browse/HDFS-7496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7496.000.patch, HDFS-7496.001.patch
>
>
> We discussed a few FsVolume removal race conditions on the DataNode in 
> HDFS-7489.  We should figure out a way to make removing an FsVolume safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) NameNode support for erasure coding block groups

2014-12-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Attachment: HDFS-7339-002.patch

This patch is work-in-progress and demonstrates the high level structure.

> NameNode support for erasure coding block groups
> 
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)