[jira] [Commented] (HDFS-4366) Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks

2013-01-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559443#comment-13559443
 ] 

Todd Lipcon commented on HDFS-4366:
---

This looks good to me. +1. Nice patch, Derek.

I'll wait til tomorrow to commit in case anyone else wants to take a look - 
this is pretty important code so having a few eyes on it would be nice.

> Block Replication Policy Implementation May Skip Higher-Priority Blocks for 
> Lower-Priority Blocks
> -
>
> Key: HDFS-4366
> URL: https://issues.apache.org/jira/browse/HDFS-4366
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 0.23.5
>Reporter: Derek Dagit
>Assignee: Derek Dagit
> Attachments: HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, 
> hdfs-4366-unittest.patch
>
>
> In certain cases, higher-priority under-replicated blocks can be skipped by 
> the replication policy implementation.  The current implementation maintains, 
> for each priority level, an index into a list of blocks that are 
> under-replicated.  Together, the lists compose a priority queue (see note 
> later about branch-0.23).  In some cases when blocks are removed from a list, 
> the caller (BlockManager) properly handles the index into the list from which 
> it removed a block.  In some other cases, the index remains stationary while 
> the list changes.  Whenever this happens, and the removed block happened to 
> be at or before the index, the implementation will skip over a block when 
> selecting blocks for replication work.
> In situations when entire racks are decommissioned, leading to many 
> under-replicated blocks, loss of blocks can occur.
> Background: HDFS-1765
> This patch to trunk greatly improved the state of the replication policy 
> implementation.  Prior to the patch, the following details were true:
>   * The block "priority queue" was no such thing: It was really set of 
> trees that held blocks in natural ordering, that being by the blocks ID, 
> which resulted in iterator walks over the blocks in pseudo-random order.
>   * There was only a single index into an iteration over all of the 
> blocks...
>   * ... meaning the implementation was only successful in respecting 
> priority levels on the first pass.  Overall, the behavior was a 
> round-robin-type scheduling of blocks.
> After the patch
>   * A proper priority queue is implemented, preserving log n operations 
> while iterating over blocks in the order added.
>   * A separate index for each priority is key is kept...
>   * ... allowing for processing of the highest priority blocks first 
> regardless of which priority had last been processed.
> The change was suggested for branch-0.23 as well as trunk, but it does not 
> appear to have been pulled in.
> The problem:
> Although the indices are now tracked in a better way, there is a 
> synchronization issue since the indices are managed outside of methods to 
> modify the contents of the queue.
> Removal of a block from a priority level without adjusting the index can mean 
> that the index then points to the block after the block it originally pointed 
> to.  In the next round of scheduling for that priority level, the block 
> originally pointed to by the index is skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4340) Update addBlock() to inculde inode id as additional argument

2013-01-21 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559413#comment-13559413
 ] 

Brandon Li commented on HDFS-4340:
--

@Nicholas, the new patch addresses your comments. I synchronized 
streamer.start() to avoid the findbugs warnings. Please let me know if you 
think it's sort of overkill to do so. Thanks!

> Update addBlock() to inculde inode id as additional argument
> 
>
> Key: HDFS-4340
> URL: https://issues.apache.org/jira/browse/HDFS-4340
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, 
> HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-21 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-4403:
--

Release Note: The HDFS implementation of getFileChecksum() can now operate 
correctly against earlier-version datanodes which do not include the checksum 
type information in their checksum response. The checksum type is automatically 
inferred by issuing a read of the first byte of each block.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-21 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559397#comment-13559397
 ] 

Suresh Srinivas commented on HDFS-4403:
---

Todd, sorry got busy with other things. +1 for the change as well.

Consider adding a brief release note on the issue with prior branch in the 
release notes to help users understand the issue.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4126) Add reading/writing snapshot information to FSImage

2013-01-21 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559393#comment-13559393
 ] 

Suresh Srinivas commented on HDFS-4126:
---

# DFSUtil#byte2String - add javadoc
# FSImageFormat.java
#* In the javadoc, SnapshotID under FSImage should be snapshotCounter or 
nextSnapshotID. Should we change the SnapshotManager#snapshotID to 
SnapshotManager#snapshotCounter?
#* As per our conversation, the INodeFile FSImage ContainsBlock will change 
when we do the file level diff and simplify the FSImage. Hence I am okay with 
the current code.
#* ComputedFileSize in javadoc could be called snapshotFileSize. The 
corresponding variable name could also be updated accordingly.
#* Snapshot in javadoc is missing snapshot name?
#* javadoc could consoldiate snapshot supported fields together
n
#* loadRoot should return void and numFiles-- should be used. Returning 1 
always just for decrement purpose does not seem intutive.
#* Snapshot related methods should be moved to an inner class or separate 
class. This can be done in a separate jira.
# FileWithSnapshot impelementation #insertBefore and #removeSelf code seems to 
be repeated in implementation?
# Add a summary of test information to the javadoc of test methods
# For commented tests can please add TODO and a brief description


> Add reading/writing snapshot information to FSImage
> ---
>
> Key: HDFS-4126
> URL: https://issues.apache.org/jira/browse/HDFS-4126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, 
> HDFS-4126.002.patch
>
>
> After the changes proposed in HDFS-4125 is completed, reading and writing 
> snapshot related information from FSImage can be implemented. This jira 
> tracks changes required for:
> # Loading snapshot information from FSImage
> # Loading snapshot related operations from editlog
> # Writing snapshot information in FSImage
> # Unit tests related to this functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HDFS-4126) Add reading/writing snapshot information to FSImage

2013-01-21 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559393#comment-13559393
 ] 

Suresh Srinivas edited comment on HDFS-4126 at 1/22/13 4:42 AM:


# DFSUtil#byte2String - add javadoc
# FSImageFormat.java
#* In the javadoc, SnapshotID under FSImage should be snapshotCounter or 
nextSnapshotID. Should we change the SnapshotManager#snapshotID to 
SnapshotManager#snapshotCounter?
#* As per our conversation, the INodeFile FSImage ContainsBlock will change 
when we do the file level diff and simplify the FSImage. Hence I am okay with 
the current code.
#* ComputedFileSize in javadoc could be called snapshotFileSize. The 
corresponding variable name could also be updated accordingly.
#* Snapshot in javadoc is missing snapshot name?
#* javadoc could consoldiate snapshot supported fields together
#* loadRoot should return void and numFiles-- should be used. Returning 1 
always just for decrement purpose does not seem intutive.
#* Snapshot related methods should be moved to an inner class or separate 
class. This can be done in a separate jira.
# FileWithSnapshot impelementation #insertBefore and #removeSelf code seems to 
be repeated in implementation?
# Add a summary of test information to the javadoc of test methods
# For commented tests can please add TODO and a brief description


  was (Author: sureshms):
# DFSUtil#byte2String - add javadoc
# FSImageFormat.java
#* In the javadoc, SnapshotID under FSImage should be snapshotCounter or 
nextSnapshotID. Should we change the SnapshotManager#snapshotID to 
SnapshotManager#snapshotCounter?
#* As per our conversation, the INodeFile FSImage ContainsBlock will change 
when we do the file level diff and simplify the FSImage. Hence I am okay with 
the current code.
#* ComputedFileSize in javadoc could be called snapshotFileSize. The 
corresponding variable name could also be updated accordingly.
#* Snapshot in javadoc is missing snapshot name?
#* javadoc could consoldiate snapshot supported fields together
n
#* loadRoot should return void and numFiles-- should be used. Returning 1 
always just for decrement purpose does not seem intutive.
#* Snapshot related methods should be moved to an inner class or separate 
class. This can be done in a separate jira.
# FileWithSnapshot impelementation #insertBefore and #removeSelf code seems to 
be repeated in implementation?
# Add a summary of test information to the javadoc of test methods
# For commented tests can please add TODO and a brief description

  
> Add reading/writing snapshot information to FSImage
> ---
>
> Key: HDFS-4126
> URL: https://issues.apache.org/jira/browse/HDFS-4126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, 
> HDFS-4126.002.patch
>
>
> After the changes proposed in HDFS-4125 is completed, reading and writing 
> snapshot related information from FSImage can be implemented. This jira 
> tracks changes required for:
> # Loading snapshot information from FSImage
> # Loading snapshot related operations from editlog
> # Writing snapshot information in FSImage
> # Unit tests related to this functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559333#comment-13559333
 ] 

Todd Lipcon commented on HDFS-4417:
---

{code}
+  @VisibleForTesting
+  public void killDataXceiverServer() {
+if (dataXceiverServer != null) {
+  ((DataXceiverServer) this.dataXceiverServer.getRunnable()).kill();
+  this.dataXceiverServer.interrupt();
+  dataXceiverServer = null;
+}
+  }
{code}

Think you forgot to delete this attempt that you didn't end up using. Also the 
removal of the assert in {{kill}} shouldn't be in the patch anymore.



{code}
+  return Mockito.mock(DomainSocket.class, 
+  new Answer() {
+@Override
+public Object answer(InvocationOnMock invocation) throws Throwable 
{
+  throw new RuntimeException("...");
+  } });
{code}

Can you add a one-line comment explaining this, like 'Return a mock which 
always throws exceptions on any of its function calls'? Also, fill in the 
exception text with something like "Injected fault" instead of "..."



Looks like your patch might be missing the new test case? I don't see anyone 
set the {{tcpReadsDisabledForTesting}} flag, nor the 
{{TestParallelShortCircuitReadUnCached}} class you mentioned.

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559329#comment-13559329
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-trunk-Commit #3265 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3265/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-21 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-4403:
--

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks for reviewing, Aaron.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4340) Update addBlock() to inculde inode id as additional argument

2013-01-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559300#comment-13559300
 ] 

Hadoop QA commented on HDFS-4340:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12565880/HDFS-4340.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3862//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3862//console

This message is automatically generated.

> Update addBlock() to inculde inode id as additional argument
> 
>
> Key: HDFS-4340
> URL: https://issues.apache.org/jira/browse/HDFS-4340
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, 
> HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559277#comment-13559277
 ] 

Colin Patrick McCabe commented on HDFS-4417:


bq. How about newTcpPeer? Remote is kind of vague.

Agree.

Using a mock for DomainSocket also worked out well.

For PeerCache, I tried out the two-cache solution, but it started getting 
pretty complicated, since we refer to the cache in many places.  Instead, I 
just added a boolean to the cache key.

In {{TestParallelShortCircuitReadUnCached}}, since this *is* a regression test 
for HDFS-4417, I figured I needed some way to make sure that we were not 
falling back on TCP sockets to read.  So I added 
{{DFSInputStream#tcpReadsDisabledForTesting}}.

I considered several other solutions.  Any solution that makes TCP sockets 
unusable, like setting a bad {{SocketFactory}}, runs into trouble because the 
first part of the test needs to create the files that we're reading.  Killing 
the {{DataNode#dataXceiverServer}} thread after doing the writes seemed like a 
promising approach, but it caused exceptions in the {{DFSOutputStream}} worker 
threads, which led to the (only) {{DataNode}} getting kicked out of the 
cluster.  Another approach is to create a subclass for {{DFSInputStream}} that 
overrides {{DFSInputStream#newTcpPeer}} to throw an exception.  However, 
getting a {{DFSClient}} to return this subclass is difficult.  Possibly 
Mockito's partial mocks could help here.

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4417:
---

Attachment: HDFS-4417.003.patch

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4340) Update addBlock() to inculde inode id as additional argument

2013-01-21 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-4340:
-

Attachment: HDFS-4340.patch

> Update addBlock() to inculde inode id as additional argument
> 
>
> Key: HDFS-4340
> URL: https://issues.apache.org/jira/browse/HDFS-4340
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, 
> HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-21 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559231#comment-13559231
 ] 

Andy Isaacson commented on HDFS-4237:
-

{noformat}
+  String address = "127.0.0.1:" + port;
{noformat}
this line grew some trailing whitespace.

{{SecureHdfsTestUtil.java}} license comment has trailing whitespace.

{noformat}
+ * Our unit tests use 127.0.0.1/localhost to address the host running
+ * the tests. However, WebHDFS secure authentication using localhost is
+ * not allowed (kerberos authentication will complain it can't find
+ * the server). The actual hostname must be used. Therefore, to run
+ * the secure WebHDFS tests in your test environment, make 127.0.0.1
+ * resolve to the actual hostname.
{noformat}

I'm not sure this is an acceptable requirement, but let's go ahead and get it 
checked in as is.  Worst case we just back out this code.
(It would be better to teach the tests how to run in a reasonable environment 
where the hostname resolves to the actual eth0 address or similar.  This may 
mean that it's impossible to do jUnit style tests of Kerberized security.)

bq. How does me adding a section on running/developing secure unit tests in the 
Developer Documentation in http://wiki.apache.org/hadoop/ sound? Is there a 
better place for documentation?

A wiki page sounds like an excellent start.  I think it belongs on a new page 
but you can use your judgment if you find a page where it fits in.


> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-21 Thread Stephen Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559226#comment-13559226
 ] 

Stephen Chu commented on HDFS-4237:
---

Woops, I forgot the Assume check in TestSecureWebHdfsFileSystemContract.

> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559219#comment-13559219
 ] 

Hadoop QA commented on HDFS-4237:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12565859/HDFS-4237.patch.007
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestSecureWebHdfsFileSystemContract

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3861//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3861//console

This message is automatically generated.

> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-21 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-4237:
--

Attachment: HDFS-4237.patch.007

Thank you for the review, Andy. I've uploaded a new patch. In it...

I've removed the tab characters.

I used "200 * 1024 * 1024" instead of the bitshift.

I converted FileSystemContractBaseTest (and the classes that extend it) to 
JUnit4. Previously, it was written in Junit3 style (extends TestCase), but 
Junit3 TestCase and Junit4 Assume are incompatible, e.g. HDFS-3966.

How does me adding a section on running/developing secure unit tests in the 
Developer Documentation in http://wiki.apache.org/hadoop/ sound? Is there a 
better place for documentation?

> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559116#comment-13559116
 ] 

Todd Lipcon commented on HDFS-4417:
---

{code}
-  private Peer newPeer(InetSocketAddress addr) throws IOException {
+  private Peer newRemotePeer(InetSocketAddress addr) throws IOException {
{code}

How about {{newTcpPeer}}? Remote is kind of vague.



{code}
+  public static DomainSocket getClosedSocket() {
+return new DomainSocket("", -1);
+  }
{code}

This doesn't seem like a reasonable thing to expose. Instead, since it's just 
used from tests, could you just create a mock DomainSocket object which throws 
ClosedChannelException on write?



I think the changes to PeerCache are a little over-complicated... why not just 
have two separate PeerCaches, one for each type of peer?


> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4417.002.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4414) Create a DiffReport class to represent the diff between snapshots to end users

2013-01-21 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559027#comment-13559027
 ] 

Aaron T. Myers commented on HDFS-4414:
--

This seems like a great feature to add a public-facing (unstable or evolving) 
programmatic API for. Given that, consider moving this API to the HdfsAdmin 
class instead of DistributedFileSystem, which is marked only LimitedPrivate to 
MR and HBase?

> Create a DiffReport class to represent the diff between snapshots to end users
> --
>
> Key: HDFS-4414
> URL: https://issues.apache.org/jira/browse/HDFS-4414
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-4414.001.patch, HDFS-4414+4131.002.patch
>
>
> HDFS-4131 computes the difference between two snapshots (or between a 
> snapshot and the current tree). In this jira we create a DiffReport class to 
> represent the diff to end users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2554) Add separate metrics for missing blocks with desired replication level 1

2013-01-21 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-2554:


Target Version/s:   (was: )
  Status: Open  (was: Patch Available)

> Add separate metrics for missing blocks with desired replication level 1
> 
>
> Key: HDFS-2554
> URL: https://issues.apache.org/jira/browse/HDFS-2554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-2554-1.txt, hdfs-2554.txt
>
>
> Some users use replication level set to 1 for datasets which are unimportant 
> and can be lost with no worry (eg the output of terasort tests). But other 
> data on the cluster is important and should not be lost. It would be useful 
> to separate the metric for missing blocks by the desired replication level of 
> those blocks, so that one could ignore missing blocks at repl 1 while still 
> alerting on missing blocks with higher desired replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4416) change dfs.datanode.domain.socket.path to dfs.domain.socket.path

2013-01-21 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4416.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks, Colin.

> change dfs.datanode.domain.socket.path to dfs.domain.socket.path
> 
>
> Key: HDFS-4416
> URL: https://issues.apache.org/jira/browse/HDFS-4416
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4416.001.patch, HDFS-4416.002.patch, 
> HDFS-4416.003.patch, HDFS-4416.004.patch
>
>
> {{dfs.datanode.domain.socket.path}} is used by both clients and the DataNode, 
> so it might be best to avoid putting 'datanode' in the name.  Most of the 
> configuration keys that have 'datanode' in the name apply only to the DN.
> Also, should change __PORT__ to _PORT to be consistent with _HOST, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4416) change dfs.datanode.domain.socket.path to dfs.domain.socket.path

2013-01-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559013#comment-13559013
 ] 

Todd Lipcon commented on HDFS-4416:
---

+1, committing momentarily

> change dfs.datanode.domain.socket.path to dfs.domain.socket.path
> 
>
> Key: HDFS-4416
> URL: https://issues.apache.org/jira/browse/HDFS-4416
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4416.001.patch, HDFS-4416.002.patch, 
> HDFS-4416.003.patch, HDFS-4416.004.patch
>
>
> {{dfs.datanode.domain.socket.path}} is used by both clients and the DataNode, 
> so it might be best to avoid putting 'datanode' in the name.  Most of the 
> configuration keys that have 'datanode' in the name apply only to the DN.
> Also, should change __PORT__ to _PORT to be consistent with _HOST, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-4131) Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree

2013-01-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-4131:
---

Assignee: Jing Zhao  (was: Suresh Srinivas)

> Add a tool to print the diff between two snapshots and diff of a snapshot 
> from the current tree
> ---
>
> Key: HDFS-4131
> URL: https://issues.apache.org/jira/browse/HDFS-4131
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4131.001.patch, HDFS-4131.002.patch
>
>
> This jira tracks tool to print diff between an two snapshots at a given path. 
> The tool will also print the difference between the current directory and the 
> given snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4131) Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree

2013-01-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4131:


Attachment: HDFS-4131.002.patch

Update the patch based on "HDFS-4414+4131.002.patch" in HDFS-4414: fix the code 
for checking if the metadata of a directory has been changed between snapshots.

> Add a tool to print the diff between two snapshots and diff of a snapshot 
> from the current tree
> ---
>
> Key: HDFS-4131
> URL: https://issues.apache.org/jira/browse/HDFS-4131
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: HDFS-4131.001.patch, HDFS-4131.002.patch
>
>
> This jira tracks tool to print diff between an two snapshots at a given path. 
> The tool will also print the difference between the current directory and the 
> given snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent

2013-01-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559010#comment-13559010
 ] 

Andrew Wang commented on HDFS-4350:
---

Todd's patch looks good to me. I ran the failed tests a couple times locally 
and they passed, and earlier run on this jira were fine.

> Make enabling of stale marking on read and write paths independent
> --
>
> Key: HDFS-4350
> URL: https://issues.apache.org/jira/browse/HDFS-4350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, 
> hdfs-4350-4.patch, hdfs-4350.txt
>
>
> Marking of datanodes as stale for the read and write path was introduced in 
> HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, 
> {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently 
> exists a dependency, since you cannot enable write marking without also 
> enabling read marking, since the first key enables both checking of staleness 
> and read marking.
> I propose renaming the first key to 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled 
> if either of the keys are set. This will allow read and write marking to be 
> enabled independently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4340) Update addBlock() to inculde inode id as additional argument

2013-01-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558885#comment-13558885
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4340:
--

Some more comments below:

- Since startFileInternal(..) is not changed, appendFileInt(..) do not need to 
return file status.

- the old ClientProtocol.addBlock(..) should be removed.

- checkLease(String src, String holder, INode file) is not needed.  Only 
getAdditionalBlock(..) calls it and fileId is in the parameter list.




> Update addBlock() to inculde inode id as additional argument
> 
>
> Key: HDFS-4340
> URL: https://issues.apache.org/jira/browse/HDFS-4340
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, 
> HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira