[jira] [Updated] (HDFS-5386) Add feature documentation for datanode caching.

2013-10-28 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5386:


 Target Version/s: 3.0.0  (was: HDFS-4949)
Affects Version/s: (was: HDFS-4949)
   3.0.0

> Add feature documentation for datanode caching.
> ---
>
> Key: HDFS-5386
> URL: https://issues.apache.org/jira/browse/HDFS-5386
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5386-caching.001.patch, HDFS-5386-caching.002.patch
>
>
> Write feature documentation for datanode caching, covering all of the 
> following:
> * high-level architecture
> * OS/native code requirements
> * OS configuration (ulimit -l)
> * new configuration properties for namenode and datanode
> * cache admin CLI commands
> * pointers to API for programmatic control of caching directives



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5386) Add feature documentation for datanode caching.

2013-10-28 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5386:


Attachment: HDFS-5386-caching.002.patch

I'm attaching patch version 2.  This folds in all of my prior feedback, except 
for 2 items:

* I didn't include the high-level architecture diagram.  Maybe this would be 
easier for either Colin or Andrew if they have access to the original source 
document.
* I didn't add references to the {{DistributedFileSystem}} API, because this 
isn't included in the JavaDocs.

> Add feature documentation for datanode caching.
> ---
>
> Key: HDFS-5386
> URL: https://issues.apache.org/jira/browse/HDFS-5386
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-4949
>Reporter: Chris Nauroth
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5386-caching.001.patch, HDFS-5386-caching.002.patch
>
>
> Write feature documentation for datanode caching, covering all of the 
> following:
> * high-level architecture
> * OS/native code requirements
> * OS configuration (ulimit -l)
> * new configuration properties for namenode and datanode
> * cache admin CLI commands
> * pointers to API for programmatic control of caching directives



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5441) Wrong use of catalina opts in httpfs.sh

2013-10-28 Thread Dridi Boukelmoune (JIRA)
Dridi Boukelmoune created HDFS-5441:
---

 Summary: Wrong use of catalina opts in httpfs.sh
 Key: HDFS-5441
 URL: https://issues.apache.org/jira/browse/HDFS-5441
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Dridi Boukelmoune


Hey there,

There is a comment mentioning a bug in catalina.sh (tomcat) in httpfs.sh:
https://github.com/apache/hadoop-common/blob/1f2a21f/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh#L51

This behavior (not using those opts when stopping) is the very purpose of the 
CATALINA_OPTS variable as documented in catalina.sh:
https://github.com/apache/tomcat/blob/d88ad9e/bin/catalina.sh#L36



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807709#comment-13807709
 ] 

Hadoop QA commented on HDFS-5252:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610767/HDFS-5252.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5303//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5303//console

This message is automatically generated.

> Stable write is not handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5252.001.patch
>
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write and set the flag to UNSTABLE in the 
> write response.
> One bug was found during test with Ubuntu client when copying one 1KB file. 
> For small files like 1KB file, Ubuntu client does one stable write (with 
> FILE_SYNC flag). However, NFS gateway missed one place 
> where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
> to UNSTABLE.
> With this bug, the client thinks the write is on disk and thus doesn't send 
> COMMIT anymore. The following test tries to read the data back and of course 
> fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Status: Patch Available  (was: Open)

> Stable write is not handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5252.001.patch
>
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write and set the flag to UNSTABLE in the 
> write response.
> One bug was found during test with Ubuntu client when copying one 1KB file. 
> For small files like 1KB file, Ubuntu client does one stable write (with 
> FILE_SYNC flag). However, NFS gateway missed one place 
> where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
> to UNSTABLE.
> With this bug, the client thinks the write is on disk and thus doesn't send 
> COMMIT anymore. The following test tries to read the data back and of course 
> fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HDFS-5252) Stable write is not handled correctly in someplace

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li reassigned HDFS-5252:


Assignee: Brandon Li

> Stable write is not handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5252.001.patch
>
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write and set the flag to UNSTABLE in the 
> write response.
> One bug was found during test with Ubuntu client when copying one 1KB file. 
> For small files like 1KB file, Ubuntu client does one stable write (with 
> FILE_SYNC flag). However, NFS gateway missed one place 
> where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
> to UNSTABLE.
> With this bug, the client thinks the write is on disk and thus doesn't send 
> COMMIT anymore. The following test tries to read the data back and of course 
> fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Attachment: HDFS-5252.001.patch

> Stable write is not handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5252.001.patch
>
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write and set the flag to UNSTABLE in the 
> write response.
> One bug was found during test with Ubuntu client when copying one 1KB file. 
> For small files like 1KB file, Ubuntu client does one stable write (with 
> FILE_SYNC flag). However, NFS gateway missed one place 
> where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
> to UNSTABLE.
> With this bug, the client thinks the write is on disk and thus doesn't send 
> COMMIT anymore. The following test tries to read the data back and of course 
> fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Summary: Stable write is not handled correctly in someplace  (was: Stable 
write is handled correctly in someplace)

> Stable write is not handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write and set the flag to UNSTABLE in the 
> write response.
> One bug was found during test with Ubuntu client when copying one 1KB file. 
> For small files like 1KB file, Ubuntu client does one stable write (with 
> FILE_SYNC flag). However, NFS gateway missed one place 
> where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
> to UNSTABLE.
> With this bug, the client thinks the write is on disk and thus doesn't send 
> COMMIT anymore. The following test tries to read the data back and of course 
> fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807681#comment-13807681
 ] 

Hadoop QA commented on HDFS-2832:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610761/h2832_20131028b.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5302//console

This message is automatically generated.

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, 
> h2832_20131023b.patch, h2832_20131023.patch, h2832_20131025.patch, 
> h2832_20131028b.patch, h2832_20131028.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-10-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-2832:
-

Attachment: h2832_20131028b.patch

The Jenkins failed to run through result, rename the patch and submit it again.

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, 
> h2832_20131023b.patch, h2832_20131023.patch, h2832_20131025.patch, 
> h2832_20131028b.patch, h2832_20131028.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5433) When reloading fsimage during checkpointing, we should clear existing snapshottable directories

2013-10-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807600#comment-13807600
 ] 

Todd Lipcon commented on HDFS-5433:
---

looks good to me too, +1 with Vinay's comments addressed

> When reloading fsimage during checkpointing, we should clear existing 
> snapshottable directories
> ---
>
> Key: HDFS-5433
> URL: https://issues.apache.org/jira/browse/HDFS-5433
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.2.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5433.patch
>
>
> The complete set of snapshottable directories are referenced both via the 
> file system tree and in the SnapshotManager class. It's possible that when 
> the 2NN performs a checkpoint, it will reload its in-memory state based on a 
> new fsimage from the NN, but will not clear the set of snapshottable 
> directories referenced by the SnapshotManager. In this case, the 2NN will 
> write out an fsimage that cannot be loaded, since the integer written to the 
> fsimage indicating the number of snapshottable directories will be out of 
> sync with the actual number of snapshottable directories serialized to the 
> fsimage.
> This is basically the same as HDFS-3835, but for snapshottable directories 
> instead of delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-10-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807586#comment-13807586
 ] 

Colin Patrick McCabe commented on HDFS-5394:


Andrew, if I understand your proposal correctly, you're proposing to split the 
{{replicaMap}} into three maps: {{beingCachedReplicaMap}}, 
{{cachedReplicaMap}}, and {{beingUncachedReplicaMap}}, and protect all three 
with a big lock.  It seems like this will actually result in more code, since 
we'll have to check multiple maps in many cases (i.e., we don't want to 
advertise something that is being uncached, and we don't want to start caching 
something that is already cached or currently being uncached.).  We could 
combine it into 2 maps with some funky booleans, but I think it would be get 
pretty confusing.  I really just wanted a unified map that tells me where 
everything is, not 2 or 3 maps.

>From an efficiency point of view, 3 maps is also worse than 1.. as you know :) 
> This is particularly annoying with {{HashMap}}, since its memory consumption 
>never shrinks, but only grows as needed.

I think a big part of why the complexity exists today is that we have to drop 
the (conceptual) lock when doing the mmap or munmap operation.  This is a 
requirement, since they are potentially long-running operations.  This in turn 
results in some complexity since once we finish the mmap, we have to retake the 
lock and figure out if the world changed underneath us.  For example, someone 
could have cancelled the caching operation while we released the lock and 
started doing our thing.  This complexity doesn't go away when you split the 
maps-- in fact, it gets worse, since you have to remember to check all of them. 
 If you think the compare-and-swap stuff is too complex, I could use a mutex 
for that, but again, it's going to be a similar amount of code, since it's 
doing a similar thing.

Re: background sweeper thread.  Isn't that pretty much equivalent to having a 
single Executor in {{FsDatasetCache}} like this patch adds?  I kind of like the 
{{Executor}} approach since it will tear down the thread after a few minutes of 
inactivity.  But perhaps I could be convinced otherwise.  Anyway, I'd rather do 
that refactoring later if possible.

bq. FsDatasetCache#Key#equals: This uses a string comparison of the class name. 
Should it do a reference-equals of the Class objects instead?

Sure.

bq. FsDatasetCache#getCachedBlocks: This method is no longer filtering by block 
pool. The bpid argument is unused.

Fixed.

bq. FsDatasetCache#cacheBlock: Does it make sense to move all I/O, including 
opening the streams, behind the CachingTask? If so, then this would also 
simplify the error handling, because you wouldn't need to decrement usedBytes 
and close the streams here.

I think that's a good idea.  I'll see if I can reorganize it along those lines.

bq. MappableBlock#mlocker: Can you please annotate this as @VisibleForTesting?

OK

bq. MappableBlock#load: Regarding the null check of blockChannel, is it 
actually possible for FileInputStream#getChannel to return null, or was this 
done for defensive coding purposes? (No objection if it's just defensive 
coding. I'm just curious if you know of a particular condition that causes 
this.)

I checked out the JDK source, and I don't think {{FileInputStream#getChannel}} 
can ever return null.  I guess when I wrote this, I was thinking of the 
{{Socket}} API, where {{getChannel}} sometimes does return null.  It's probably 
best to remove this null check since the API documentation is pretty clear, and 
Java catches such conditions anyway.

bq. MappableBlock#verifyChecksum: This is now passing a hard-coded file name to 
DataChecksum#verifyChunkedSums. Should this be switched back to the block file 
name?

I was having some difficulty getting at the block file name.  It's not provided 
by {{getBlockInputStream}} or {{getMetaDataInputStream}}.  It turns out that 
it's available through the {{ReplicaInfo}}, though.  Will fix.

bq. {{TestFsDatasetCache#testUncachingBlocksBeforeCachingFinishes}}...

I guess I don't really have a great solution to this.  The problem is that we 
currently don't really know when the {{DNA_UNCACHE}} messages reach the DN.  
Setting the heartbeat responses is one thing, but these responses won't be sent 
until the DN sends its own heartbeat to the NN.  It's an async process.  We 
could perhaps hook into the heartbeat handling code in the DN, but a simpler 
solution might just be using a delay 2x or 3x longer than the configured 
heartbeat.  In practice that would be 3 seconds or so.

> fix race conditions in DN caching and uncaching
> ---
>
> Key: HDFS-5394
> URL: https://issues.apache.org/jira/browse/HDFS-5394
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode,

[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807564#comment-13807564
 ] 

Hadoop QA commented on HDFS-5438:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12610713/HDFS-5438-1.trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5300//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5300//console

This message is automatically generated.

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI

2013-10-28 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807561#comment-13807561
 ] 

Haohui Mai commented on HDFS-5333:
--

The browser accesses the page, and the input from JMX directly. The HTTP 
requests look like the following:

{noformat}
http:///static/hadoop.css
http:///dfshealth.html
http:///jmx/foobar
{noformat}

The HTTP requests of accessing the old web UI look like the following:

{noformat}
http:///dfshealth.jsp
http:///static/hadoop.css
{noformat}

Therefore:

* You can access the new web UI if you can access the old one, regardless to 
the settings of the port-based firewall.
* If you access the old web UI through a proxy,  the set up for the new web UI 
is similar.

Hope that answers your question.

> Improvement of current HDFS Web UI
> --
>
> Key: HDFS-5333
> URL: https://issues.apache.org/jira/browse/HDFS-5333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Haohui Mai
>
> This is an umbrella jira for improving the current JSP-based HDFS Web UI. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI

2013-10-28 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807555#comment-13807555
 ] 

Larry McCay commented on HDFS-5333:
---

Okay, I may be off base then.
Are REST APIs being invoked from the Browser or not?
If they are then they won't be able to get to the services.

> Improvement of current HDFS Web UI
> --
>
> Key: HDFS-5333
> URL: https://issues.apache.org/jira/browse/HDFS-5333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Haohui Mai
>
> This is an umbrella jira for improving the current JSP-based HDFS Web UI. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI

2013-10-28 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807554#comment-13807554
 ] 

Haohui Mai commented on HDFS-5333:
--

The server serves both the old and the new web UI at the exactly same HTTP / 
HTTPS port. You're accessing the JSP and the new Web UI through the same port, 
so I believe that this is a non-issue.


> Improvement of current HDFS Web UI
> --
>
> Key: HDFS-5333
> URL: https://issues.apache.org/jira/browse/HDFS-5333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Haohui Mai
>
> This is an umbrella jira for improving the current JSP-based HDFS Web UI. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI

2013-10-28 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807548#comment-13807548
 ] 

Larry McCay commented on HDFS-5333:
---

Well, I think it is important to consider that serverside code is executing 
within the cluster (other side of the firewall) and that it would have direct 
access to service endpoints. So, in that respect the old web UI will work - 
assuming that the port is open to get to it from the outside.

In the new UI, the connections will be made from the client where it will need 
to go through the gateway to get the services.
Unless I am missing something.

> Improvement of current HDFS Web UI
> --
>
> Key: HDFS-5333
> URL: https://issues.apache.org/jira/browse/HDFS-5333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Haohui Mai
>
> This is an umbrella jira for improving the current JSP-based HDFS Web UI. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS

2013-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807547#comment-13807547
 ] 

Hudson commented on HDFS-4949:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4664 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4664/])
Merge HDFS-4949 branch back into trunk (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536572)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BatchedRemoteIterator.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HasEnhancedByteBufferAccess.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ReadOption.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ZeroCopyUnavailableException.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/FsPermission.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/ByteBufferPool.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/ElasticByteBufferPool.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/IdentityHashStore.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/IntrusiveCollection.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightCache.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringUtils.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/core
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/nativeio/TestNativeIO.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestIdentityHashStore.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightGSet.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmapManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/AddPathBasedCacheDirectiveException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/PathBasedCacheDescriptor.java

[jira] [Updated] (HDFS-5320) Add datanode caching metrics

2013-10-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5320:
---

 Target Version/s: 3.0.0  (was: HDFS-4949)
Affects Version/s: (was: HDFS-4949)
   3.0.0

> Add datanode caching metrics
> 
>
> Key: HDFS-5320
> URL: https://issues.apache.org/jira/browse/HDFS-5320
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: hdfs-5320-1.patch, hdfs-5320-2.patch
>
>
> It'd be good to hook up datanode metrics for # (blocks/bytes) 
> (cached/uncached/failed to cache) over different time windows 
> (eternity/1hr/10min/1min).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-10-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

 Target Version/s: 3.0.0  (was: HDFS-4949)
Affects Version/s: 3.0.0

> add modifyDirective to cacheAdmin
> -
>
> Key: HDFS-5326
> URL: https://issues.apache.org/jira/browse/HDFS-5326
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>
> We should add a way of modifying cache directives on the command-line, 
> similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5253) Add requesting user's name to PathBasedCacheEntry

2013-10-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5253:
---

Affects Version/s: (was: HDFS-4949)

> Add requesting user's name to PathBasedCacheEntry
> -
>
> Key: HDFS-5253
> URL: https://issues.apache.org/jira/browse/HDFS-5253
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>
> It'll be useful to have the requesting user's name in {{PathBasedCacheEntry}} 
> for tracking per-user statistics (e.g. amount of data cached by a user).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-10-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5394:
---

 Target Version/s: 3.0.0  (was: HDFS-4949)
Affects Version/s: (was: HDFS-4949)
   3.0.0

> fix race conditions in DN caching and uncaching
> ---
>
> Key: HDFS-5394
> URL: https://issues.apache.org/jira/browse/HDFS-5394
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5394-caching.001.patch, 
> HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
> HDFS-5394-caching.004.patch
>
>
> The DN needs to handle situations where it is asked to cache the same replica 
> more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
> also needs to handle the situation where caching a replica is cancelled 
> before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid

2013-10-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5182:
---

 Target Version/s: 3.0.0  (was: HDFS-4949)
Affects Version/s: (was: HDFS-4949)
   3.0.0

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's 
> valid
> -
>
> Key: HDFS-5182
> URL: https://issues.apache.org/jira/browse/HDFS-5182
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's 
> valid.  This implies adding a new field to the response to 
> REQUEST_SHORT_CIRCUIT_FDS.  We also need some kind of heartbeat from the 
> client to the DN, so that the DN can inform the client when the mapped region 
> is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807502#comment-13807502
 ] 

Hadoop QA commented on HDFS-5438:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610696/HDFS-5438.trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery
  org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5299//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//console

This message is automatically generated.

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI

2013-10-28 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807499#comment-13807499
 ] 

Haohui Mai commented on HDFS-5333:
--

Hi [~lmccay],

Thanks for the input! This is complementary of the problem of web UI. I believe 
the old web UI does not work in the scenario you mentioned.

The new web UI won't work for now either, as there are a few places that the 
code used absolute URLs. However, this can be easily fixed in the new web UI.

> Improvement of current HDFS Web UI
> --
>
> Key: HDFS-5333
> URL: https://issues.apache.org/jira/browse/HDFS-5333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Haohui Mai
>
> This is an umbrella jira for improving the current JSP-based HDFS Web UI. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807491#comment-13807491
 ] 

Haohui Mai commented on HDFS-5436:
--

Te planned support of HTTPS on hftp and webhdfs requires even more shared code. 
Putting all three filesystem in the same package allows us to limit the 
visibility of the codes that are only used in these systems.

This refactor should improve the readability and the modularity of the 
implementation of hftp / hsftp / webhdfs.

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, 
> HDFS-5436.002.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807483#comment-13807483
 ] 

Brandon Li commented on HDFS-5436:
--

{quote}This force several methods in ByteInputStream and URLConnectionFactory 
to be public methods.{quote}
The patch moves the Http access related classes from org.apache.hdfs into 
org.apache.hdfs.wb. Any other reasons justify the move besides the one you 
listed above?

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, 
> HDFS-5436.002.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI

2013-10-28 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807481#comment-13807481
 ] 

Larry McCay commented on HDFS-5333:
---

Interesting work!

It seems to me that we may need to consider deployments where a gateway such as 
Knox is between the UI client and the Hadoop cluster.
How are the relevant URLs configured for the deployment - are they easily 
configured for a particular deployment scenario such as this?

> Improvement of current HDFS Web UI
> --
>
> Key: HDFS-5333
> URL: https://issues.apache.org/jira/browse/HDFS-5333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Haohui Mai
>
> This is an umbrella jira for improving the current JSP-based HDFS Web UI. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: h2832_20131028.patch

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf, 
> h2832_20131023b.patch, h2832_20131023.patch, h2832_20131025.patch, 
> h2832_20131028.patch
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5437) TestBlockReport and TestBPOfferService fail due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Attachment: h5437.05.patch

Including a trivial fix in {{SimulatedFSDataset#getStorageReports}}.

> TestBlockReport and TestBPOfferService fail due to test issues
> --
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: h5437.03.patch, h5437.04.patch, h5437.05.patch
>
>
> There are a few more test issues in {{TestBlockReport}} caused by the earlier 
> changes.
> {{testBlockReport_07}} fails and it looks like a test issue.
> {code}
> Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
> elapsed: 19.314 sec  <<< FAILURE!
> java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
> was:<0>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
> {code}
> {{TestBPOfferService}} fails due to missing implementation of 
> {{SimulatedFSDataset#getStorageReports}}.
> {code}
> 2013-10-28 16:24:33,775 ERROR datanode.DataNode 
> (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool 
> fake bpid (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0
> java.lang.UnsupportedOperationException  at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getStorageReports(SimulatedFSDataset.java:1005)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:478)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:566)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717)
>   at java.lang.Thread.run(Thread.java:695)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5437) TestBlockReport and TestBPOfferService fail due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Description: 
There are a few more test issues in {{TestBlockReport}} caused by the earlier 
changes.

{{testBlockReport_07}} fails and it looks like a test issue.
{code}
Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
elapsed: 19.314 sec  <<< FAILURE!
java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
was:<0>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
{code}

{{TestBPOfferService}} fails due to missing implementation of 
{{SimulatedFSDataset#getStorageReports}}.

{code}
2013-10-28 16:24:33,775 ERROR datanode.DataNode (BPServiceActor.java:run(719)) 
- Exception in BPOfferService for Block pool fake bpid (Datanode Uuid null) 
service to 0.0.0.0/0.0.0.0:0
java.lang.UnsupportedOperationException  at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getStorageReports(SimulatedFSDataset.java:1005)
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:478)
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:566)
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717)
  at java.lang.Thread.run(Thread.java:695)
{code}


  was:
There are a few more test issues in {{TestBlockReport}} caused by the earlier 
changes.

{{testBlockReport_07}} fails and it looks like a test issue.
{code}
Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
elapsed: 19.314 sec  <<< FAILURE!
java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
was:<0>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
{code}


> TestBlockReport and TestBPOfferService fail due to test issues
> --
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: h5437.03.patch, h5437.04.patch
>
>
> There are a few more test issues in {{TestBlockReport}} caused by the earlier 
> changes.
> {{testBlockReport_07}} fails and it looks like a test issue.
> {code}
> Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
> elapsed: 19.314 sec  <<< FAILURE!
> java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
> was:<0>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
> {code}
> {{TestBPOfferService}} fails due to missing implementation of 
> {{SimulatedFSDataset#getStorageReports}}.
> {code}
> 2013-10-28 16:24:33,775 ERROR datanode.DataNode 
> (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool 
> fake bpid (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0
> java.lang.UnsupportedOperationException  at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getStorageReports(SimulatedFSDataset.java:1005)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:478)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:566)
>   at 
> org.apache.hadoop.hdfs.server.datano

[jira] [Updated] (HDFS-5437) TestBlockReport and TestBPOfferService fail due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Summary: TestBlockReport and TestBPOfferService fail due to test issues  
(was: TestBlockReport fails due to test issues)

> TestBlockReport and TestBPOfferService fail due to test issues
> --
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: h5437.03.patch, h5437.04.patch
>
>
> There are a few more test issues in {{TestBlockReport}} caused by the earlier 
> changes.
> {{testBlockReport_07}} fails and it looks like a test issue.
> {code}
> Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
> elapsed: 19.314 sec  <<< FAILURE!
> java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
> was:<0>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Attachment: HDFS-5438-1.trunk.patch

The new patch adds check for gen stamp in the case where the stored block state 
is UNDER_CONSTRUCTION and reported replica state is FINALIZED.  

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807446#comment-13807446
 ] 

Hadoop QA commented on HDFS-5436:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610662/HDFS-5436.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-extras.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5295//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5295//console

This message is automatically generated.

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, 
> HDFS-5436.002.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Attachment: h5437.04.patch

> TestBlockReport fails due to test issues
> 
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: h5437.03.patch, h5437.04.patch
>
>
> There are a few more test issues in {{TestBlockReport}} caused by the earlier 
> changes.
> {{testBlockReport_07}} fails and it looks like a test issue.
> {code}
> Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
> elapsed: 19.314 sec  <<< FAILURE!
> java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
> was:<0>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is handled correctly in someplace

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Description: 
When the client asks for a stable write but the prerequisite writes are not 
transferred to NFS gateway, the stableness can't be honored. NFS gateway has to 
treat the write as unstable write and set the flag to UNSTABLE in the write 
response.

One bug was found during test with Ubuntu client when copying one 1KB file. For 
small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC 
flag). However, NFS gateway missed one place where(OpenFileCtx#doSingleWrite) 
it sends response with the flag NOT updated to UNSTABLE.

With this bug, the client thinks the write is on disk and thus doesn't send 
COMMIT anymore. The following test tries to read the data back and of course 
fails to do so since the data was not synced. 

  was:
When the client asks for a stable write but the prerequisite writes are not 
transferred to NFS gateway, the stableness can't be honored. NFS gateway has to 
treat the write as unstable write and set the flag to UNSTABLE in the write 
response.

One bug was found during test with Ubuntu client when copying one 1KB file. For 
small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC 
flag). However, NFS gateway missed one place where it sends response with the 
flag NOT updated to UNSTABLE.

With this bug, the client thinks the write is on disk and thus doesn't send 
COMMIT anymore. The following test tries to read the data back and of course 
fails to do so since the data was not synced. 


> Stable write is handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write and set the flag to UNSTABLE in the 
> write response.
> One bug was found during test with Ubuntu client when copying one 1KB file. 
> For small files like 1KB file, Ubuntu client does one stable write (with 
> FILE_SYNC flag). However, NFS gateway missed one place 
> where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
> to UNSTABLE.
> With this bug, the client thinks the write is on disk and thus doesn't send 
> COMMIT anymore. The following test tries to read the data back and of course 
> fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is handled correctly in someplace

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Description: 
When the client asks for a stable write but the prerequisite writes are not 
transferred to NFS gateway, the stableness can't be honored. NFS gateway has to 
treat the write as unstable write and set the flag to UNSTABLE in the write 
response.

One bug was found during test with Ubuntu client when copying one 1KB file. For 
small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC 
flag). However, NFS gateway missed one place where it sends response with the 
flag NOT updated to UNSTABLE.

With this bug, the client thinks the write is on disk and thus doesn't send 
COMMIT anymore. The following test tries to read the data back and of course 
fails to do so since the data was not synced. 

  was:When the client asks for a stable write but the prerequisite writes are 
not transferred to NFS gateway, the stableness can't be honored. NFS gateway 
has to treat the write as unstable write.


> Stable write is handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write and set the flag to UNSTABLE in the 
> write response.
> One bug was found during test with Ubuntu client when copying one 1KB file. 
> For small files like 1KB file, Ubuntu client does one stable write (with 
> FILE_SYNC flag). However, NFS gateway missed one place where it sends 
> response with the flag NOT updated to UNSTABLE.
> With this bug, the client thinks the write is on disk and thus doesn't send 
> COMMIT anymore. The following test tries to read the data back and of course 
> fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is handled correctly in someplace

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Summary: Stable write is handled correctly in someplace  (was: Stable write 
is handled correctly)

> Stable write is handled correctly in someplace
> --
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is handled correctly

2013-10-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Summary: Stable write is handled correctly  (was: Do unstable write only 
when stable write can't be honored)

> Stable write is handled correctly
> -
>
> Key: HDFS-5252
> URL: https://issues.apache.org/jira/browse/HDFS-5252
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: nfs
>Reporter: Brandon Li
>
> When the client asks for a stable write but the prerequisite writes are not 
> transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
> to treat the write as unstable write.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5440) Extract the logic of handling delegation tokens in HftpFileSystem to the TokenAspect class

2013-10-28 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5440:


 Summary: Extract the logic of handling delegation tokens in 
HftpFileSystem to the TokenAspect class
 Key: HDFS-5440
 URL: https://issues.apache.org/jira/browse/HDFS-5440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


The logic of handling delegation token in HftpFileSystem and WebHdfsFileSystem 
are mostly identical. To simplify the code, this jira proposes to extract the 
common code into a new class named TokenAspect.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5439) Fix TestPendingReplications

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5439:


Assignee: (was: Arpit Agarwal)

> Fix TestPendingReplications
> ---
>
> Key: HDFS-5439
> URL: https://issues.apache.org/jira/browse/HDFS-5439
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>
> {{TestPendingReplication}} fails with the following exception:
> {code}
> java.lang.AssertionError: expected:<4> but was:<3>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at org.junit.Assert.assertEquals(Assert.java:456)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5439) Fix TestPendingReplications

2013-10-28 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807418#comment-13807418
 ] 

Arpit Agarwal commented on HDFS-5439:
-

The same issue appears to cause a failure in {{TestBlockReport#blockReport_07}}.

> Fix TestPendingReplications
> ---
>
> Key: HDFS-5439
> URL: https://issues.apache.org/jira/browse/HDFS-5439
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> {{TestPendingReplication}} fails with the following exception:
> {code}
> java.lang.AssertionError: expected:<4> but was:<3>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at org.junit.Assert.assertEquals(Assert.java:456)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Attachment: h5437.03.patch

Some refactoring of the test case. Also added two {{@VisibleForTesting}} 
methods to {{BlockListAsLongs}}.

{{TestBlockReport#blockReport_07}} will fail on a different assertion now due 
to HDFS-5439.

> TestBlockReport fails due to test issues
> 
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: h5437.03.patch
>
>
> There are a few more test issues in {{TestBlockReport}} caused by the earlier 
> changes.
> {{testBlockReport_07}} fails and it looks like a test issue.
> {code}
> Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
> elapsed: 19.314 sec  <<< FAILURE!
> java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
> was:<0>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Attachment: HDFS-5438.trunk.patch

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Status: Patch Available  (was: Open)

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Attachment: (was: HDFS-5438.trunk.patch)

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Status: Open  (was: Patch Available)

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5439) Fix TestPendingReplications

2013-10-28 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5439:
---

 Summary: Fix TestPendingReplications
 Key: HDFS-5439
 URL: https://issues.apache.org/jira/browse/HDFS-5439
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


{{TestPendingReplication}} fails with the following exception:

{code}
java.lang.AssertionError: expected:<4> but was:<3>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186)
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807243#comment-13807243
 ] 

Hadoop QA commented on HDFS-5438:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610678/HDFS-5438.trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5297//console

This message is automatically generated.

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Description: 
The incremental block reports from data nodes and block commits are 
asynchronous. This becomes troublesome when the gen stamp for a block is 
changed during a write pipeline recovery.

* If an incremental block report is delayed from a node but NN had enough 
replicas already, a report with the old gen stamp may be received after block 
completion. This replica will be correctly marked corrupt. But if the node had 
participated in the pipeline recovery, a new (delayed) report with the correct 
gen stamp will come soon. However, this report won't have any effect on the 
corrupt state of the replica.

* If block reports are received while the block is still under construction 
(i.e. client's call to make block committed has not been received by NN), they 
are blindly accepted regardless of the gen stamp. If a failed node reports in 
with the old gen stamp while pipeline recovery is on-going, it will be accepted 
and counted as valid during commit of the block.

Due to the above two problems, correct replicas can be marked corrupt and 
corrupt replicas can be accepted during commit.  So far we have observed two 
cases in production.

* The client hangs forever to close a file. All replicas are marked corrupt.
* After the successful close of a file, read fails. Corrupt replicas are 
accepted during commit and valid replicas are marked corrupt afterward.


  was:
The incremental block reports from data nodes and block commits are 
asynchronous. This becomes troublesome when the gen stamp for a block is 
changed during a write pipeline recovery.

* If an incremental block report is delayed from a node but NN had enough 
replicas already, a report with the old gen stamp may be received after block 
completion. This replica will be correctly marked corrupt. But if the node had 
participated in the pipeline recovery, a new (delayed) report with the correct 
gen stamp will come soon. However, this report won't have any effect on the 
corrupt state of the replica.

* If block reports are received while the block is still under construction 
(i.e. client's call to make block committed has not been received by NN), they 
are blindly accepted regardless of the gen stamp. If a failed node reports in 
with the old gen stamp while pipeline recovery is on-going, it will be accepted 
and counted as valid during commit of the block.

Due to the above two problems, correct replicas can be marked corrupt and 
corrupt replicas can be accepted during commit.  So far we have observed two 
cases in production.

* The client hangs forever to close a file. All replicas are marked corrupt.
* After the successful close of a file, read fails. Corrupt replicas are 
accepted them during commit and valid replicas are marked corrupt afterward.



> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807241#comment-13807241
 ] 

Hadoop QA commented on HDFS-5438:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610678/HDFS-5438.trunk.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5298//console

This message is automatically generated.

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Attachment: (was: HDFS-5438.trunk.patch)

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Status: Open  (was: Patch Available)

Oops. The critical line is commented out for testing in the patch.

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Status: Patch Available  (was: Open)

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Attachment: HDFS-5438.trunk.patch

Reposting the patch.

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807225#comment-13807225
 ] 

Kihwal Lee commented on HDFS-5438:
--

This is a high-level description of what the patch does.
* The patch makes NN save the list of already reported replicas when starting a 
pipeline recovery. If a new report with the new gen stamp is not received for 
the existing replica until the recovery is done, it will be marked corrupt.
* If a block report is received for existing corrupt replica and it is no 
longer corrupt, NN will remove it from the corrupt replicas map.
* If client cannot close a file because the block does not have enough number 
of valid replicas, it eventually gives up rather than hanging forever. It is 
already failing after a number of retries when adding a new block.  It will use 
the same retry limit in compleFile(), but the timeout will double every time to 
make it try harder. With the default of 5 retries, a client will wait at least 
4 minutes and give up. If NN is not responding, it may wait longer.

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5436:
-

Attachment: HDFS-5436.002.patch

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, 
> HDFS-5436.002.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Attachment: HDFS-5438.trunk.patch

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-5438:


Assignee: Kihwal Lee

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5438:
-

Status: Patch Available  (was: Open)

> Flaws in block report processing can cause data loss
> 
>
> Key: HDFS-5438
> URL: https://issues.apache.org/jira/browse/HDFS-5438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are 
> asynchronous. This becomes troublesome when the gen stamp for a block is 
> changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough 
> replicas already, a report with the old gen stamp may be received after block 
> completion. This replica will be correctly marked corrupt. But if the node 
> had participated in the pipeline recovery, a new (delayed) report with the 
> correct gen stamp will come soon. However, this report won't have any effect 
> on the corrupt state of the replica.
> * If block reports are received while the block is still under construction 
> (i.e. client's call to make block committed has not been received by NN), 
> they are blindly accepted regardless of the gen stamp. If a failed node 
> reports in with the old gen stamp while pipeline recovery is on-going, it 
> will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and 
> corrupt replicas can be accepted during commit.  So far we have observed two 
> cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are 
> accepted them during commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5438) Flaws in block report processing can cause data loss

2013-10-28 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-5438:


 Summary: Flaws in block report processing can cause data loss
 Key: HDFS-5438
 URL: https://issues.apache.org/jira/browse/HDFS-5438
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0, 0.23.9
Reporter: Kihwal Lee
Priority: Critical


The incremental block reports from data nodes and block commits are 
asynchronous. This becomes troublesome when the gen stamp for a block is 
changed during a write pipeline recovery.

* If an incremental block report is delayed from a node but NN had enough 
replicas already, a report with the old gen stamp may be received after block 
completion. This replica will be correctly marked corrupt. But if the node had 
participated in the pipeline recovery, a new (delayed) report with the correct 
gen stamp will come soon. However, this report won't have any effect on the 
corrupt state of the replica.

* If block reports are received while the block is still under construction 
(i.e. client's call to make block committed has not been received by NN), they 
are blindly accepted regardless of the gen stamp. If a failed node reports in 
with the old gen stamp while pipeline recovery is on-going, it will be accepted 
and counted as valid during commit of the block.

Due to the above two problems, correct replicas can be marked corrupt and 
corrupt replicas can be accepted during commit.  So far we have observed two 
cases in production.

* The client hangs forever to close a file. All replicas are marked corrupt.
* After the successful close of a file, read fails. Corrupt replicas are 
accepted them during commit and valid replicas are marked corrupt afterward.




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5432) TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.

2013-10-28 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807184#comment-13807184
 ] 

Arpit Agarwal commented on HDFS-5432:
-

+1 for the updated patch also.

> TestDatanodeJsp fails on Windows due to assumption that loopback address 
> resolves to host name localhost.
> -
>
> Key: HDFS-5432
> URL: https://issues.apache.org/jira/browse/HDFS-5432
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Trivial
> Attachments: HDFS-5432.1.patch, HDFS-5432.2.patch
>
>
> As discussed in many previous issues, Windows differs from Unixes in that it 
> does not resolve the loopback address to hostname "localhost".  Instead, the 
> host name remains unresolved as "127.0.0.1".  {{TestDatanodeJsp}} fails on 
> Windows, because it attempts to assert a string match containing "localhost" 
> as the host name.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807189#comment-13807189
 ] 

Hadoop QA commented on HDFS-5436:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610614/HDFS-5436.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-extras:

  org.apache.hadoop.hdfs.security.TestDelegationToken

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5293//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5293//console

This message is automatically generated.

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5432) TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.

2013-10-28 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5432:


Attachment: HDFS-5432.2.patch

Thanks for the reviews, but I'm uploading a new version.  I think we can avoid 
the Windows-specific conditional by pulling the correct host name out of the 
{{InetSocketAddress}}.  I re-tested this successfully on Mac and Windows.

Does this still look OK?

> TestDatanodeJsp fails on Windows due to assumption that loopback address 
> resolves to host name localhost.
> -
>
> Key: HDFS-5432
> URL: https://issues.apache.org/jira/browse/HDFS-5432
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Trivial
> Attachments: HDFS-5432.1.patch, HDFS-5432.2.patch
>
>
> As discussed in many previous issues, Windows differs from Unixes in that it 
> does not resolve the loopback address to hostname "localhost".  Instead, the 
> host name remains unresolved as "127.0.0.1".  {{TestDatanodeJsp}} fails on 
> Windows, because it attempts to assert a string match containing "localhost" 
> as the host name.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5435) File append fails to initialize storageIDs

2013-10-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807153#comment-13807153
 ] 

Junping Du commented on HDFS-5435:
--

Thanks Arpit and Nicholas for quickly response and review!

> File append fails to initialize storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: Heterogeneous Storage (HDFS-2832)
>
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Description: 
There are a few more test issues in {{TestBlockReport}} caused by the earlier 
changes.

{{testBlockReport_07}} fails and it looks like a test issue.
{code}
Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
elapsed: 19.314 sec  <<< FAILURE!
java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
was:<0>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
{code}

  was:
There are a few more test failures in {{TestBlockReport}} caused by the earlier 
changes.

{{testBlockReport_07}} fails and it looks like a test issue.
{code}
Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
elapsed: 19.314 sec  <<< FAILURE!
java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
was:<0>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
{code}


> TestBlockReport fails due to test issues
> 
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> There are a few more test issues in {{TestBlockReport}} caused by the earlier 
> changes.
> {{testBlockReport_07}} fails and it looks like a test issue.
> {code}
> Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
> elapsed: 19.314 sec  <<< FAILURE!
> java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
> was:<0>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Description: 
There are a few more test failures in {{TestBlockReport}} caused by the earlier 
changes.

{{testBlockReport_07}} fails and it looks like a test issue.
{code}
Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
elapsed: 19.314 sec  <<< FAILURE!
java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
was:<0>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
{code}

  was:There are a few more test failures in TestBlockReport caused by the 
earlier changes. testBlockReport_07 fails and it looks like a test issue.


> TestBlockReport fails due to test issues
> 
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> There are a few more test failures in {{TestBlockReport}} caused by the 
> earlier changes.
> {{testBlockReport_07}} fails and it looks like a test issue.
> {code}
> Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport
> blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport)  Time 
> elapsed: 19.314 sec  <<< FAILURE!
> java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but 
> was:<0>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS

2013-10-28 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807135#comment-13807135
 ] 

Chris Nauroth commented on HDFS-4949:
-

+1 for the merge.  Thanks again, Andrew and Colin.

> Centralized cache management in HDFS
> 
>
> Key: HDFS-4949
> URL: https://issues.apache.org/jira/browse/HDFS-4949
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: caching-design-doc-2013-07-02.pdf, 
> caching-design-doc-2013-08-09.pdf, caching-design-doc-2013-10-24.pdf, 
> caching-testplan.pdf, HDFS-4949-consolidated.patch
>
>
> HDFS currently has no support for managing or exposing in-memory caches at 
> datanodes. This makes it harder for higher level application frameworks like 
> Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
> explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5437) Fix TestBlockReport

2013-10-28 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5437:
---

 Summary: Fix TestBlockReport
 Key: HDFS-5437
 URL: https://issues.apache.org/jira/browse/HDFS-5437
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


There are a few more test failures in TestBlockReport caused by the earlier 
changes. testBlockReport_07 fails and it looks like a test issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5437:


Summary: TestBlockReport fails due to test issues  (was: Fix 
TestBlockReport)

> TestBlockReport fails due to test issues
> 
>
> Key: HDFS-5437
> URL: https://issues.apache.org/jira/browse/HDFS-5437
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> There are a few more test failures in TestBlockReport caused by the earlier 
> changes. testBlockReport_07 fails and it looks like a test issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5402) Deprecate the JSP web uis in HDFS

2013-10-28 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807085#comment-13807085
 ] 

Haohui Mai commented on HDFS-5402:
--

Just to clarify, the old and the new Web UIs can coexist. You can still access 
the old web UI using the same URLs until the JSPs are removed.

> Deprecate the JSP web uis in HDFS
> -
>
> Key: HDFS-5402
> URL: https://issues.apache.org/jira/browse/HDFS-5402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>
> This JIRA tracks the discussion of transitioning from old, JSP web UIs to the 
> HTML 5 based web UIs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5402) Deprecate the JSP web uis in HDFS

2013-10-28 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807081#comment-13807081
 ] 

Haohui Mai commented on HDFS-5402:
--

bq. If we're going to remove the old web UI, I think the new web UI has
to have the same level of unit testing. We shouldn't go backwards in
terms of unit testing.

I take a look at TestNamenodeJspHelper / TestDatanodeJspHelper / 
TestClusterJspHelper. It seems to me that we can merge these tests with the 
unit tests on JMX.

bq. If we are going to
remove this capability, we need to add some other command-line tools
to get the same functionality. These tools could use REST if we have
that, or JMX, but they need to exist before we can consider removing
the old UI.

This is a good point. Since all information are available through JMX, the 
easiest way to approach it is to write some scripts using Node.js. The 
architecture of the new Web UIs is ready for this.

> Deprecate the JSP web uis in HDFS
> -
>
> Key: HDFS-5402
> URL: https://issues.apache.org/jira/browse/HDFS-5402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>
> This JIRA tracks the discussion of transitioning from old, JSP web UIs to the 
> HTML 5 based web UIs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1

2013-10-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807064#comment-13807064
 ] 

Colin Patrick McCabe commented on HDFS-5434:


Most users I am aware of who use replication factor 1 do it because they don't 
want the overhead of a multi-datanode pipeline that writes to multiple 
datanodes.  If we give them such a pipeline anyway, it's contrary to what 
replication factor 1 has always meant.

If your proposed solution works for you, there is a way to do it without 
modifying HDFS at all.  Simply write with replication=2, and then call 
setReplication on the file after closing it.

It seems like maybe your concern has more to do with how gracefully we handle 
pipeline failures (currently, not very gracefully).  But that's a separate 
issue (see HDFS-4504 for details.)

> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the surviving data 
> node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5436:
-

Attachment: HDFS-5436.001.patch

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5402) Deprecate the JSP web uis in HDFS

2013-10-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807045#comment-13807045
 ] 

Colin Patrick McCabe commented on HDFS-5402:


This is a really interesting project, Haohui.  I think it will make
our web UI much nicer.

I have a few concerns about removing the old web UI, however:

* If we're going to remove the old web UI, I think the new web UI has
to have the same level of unit testing.  We shouldn't go backwards in
terms of unit testing.

* Most of the deployments of elinks and links out there don't support
Javascript.  This is just a reality of life when using CentOS 5 or 6,
which many users are still using.  I have used "links" to diagnose
problems through the web UI in the past, in systems where access to
the cluster was available only through telnet.  If we are going to
remove this capability, we need to add some other command-line tools
to get the same functionality.  These tools could use REST if we have
that, or JMX, but they need to exist before we can consider removing
the old UI.

> Deprecate the JSP web uis in HDFS
> -
>
> Key: HDFS-5402
> URL: https://issues.apache.org/jira/browse/HDFS-5402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>
> This JIRA tracks the discussion of transitioning from old, JSP web UIs to the 
> HTML 5 based web UIs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5435) File append fails to initialize storageIDs

2013-10-28 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5435:
-

Hadoop Flags: Reviewed

+1 patch looks good.

> File append fails to initialize storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: Heterogeneous Storage (HDFS-2832)
>
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5413) hdfs.cmd does not support passthrough to any arbitrary class.

2013-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807026#comment-13807026
 ] 

Hudson commented on HDFS-5413:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4662 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4662/])
HDFS-5413. hdfs.cmd does not support passthrough to any arbitrary class. 
Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536448)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs.cmd


> hdfs.cmd does not support passthrough to any arbitrary class.
> -
>
> Key: HDFS-5413
> URL: https://issues.apache.org/jira/browse/HDFS-5413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 3.0.0, 2.2.1
>
> Attachments: HDFS-5413.1.patch, HDFS-5413.2.patch
>
>
> The hdfs shell script supports passthrough to calling any arbitrary class if 
> the first argument is not one of the per-defined sub-commands.  The 
> equivalent cmd script does not implement this and instead fails trying to do 
> a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807018#comment-13807018
 ] 

Hadoop QA commented on HDFS-5436:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610601/HDFS-5436.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5292//console

This message is automatically generated.

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5413) hdfs.cmd does not support passthrough to any arbitrary class.

2013-10-28 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5413:


   Resolution: Fixed
Fix Version/s: 2.2.1
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this patch to trunk, branch-2, and branch-2.2.  Thank you to 
Chuan and Arpit for the code reviews.

> hdfs.cmd does not support passthrough to any arbitrary class.
> -
>
> Key: HDFS-5413
> URL: https://issues.apache.org/jira/browse/HDFS-5413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 3.0.0, 2.2.1
>
> Attachments: HDFS-5413.1.patch, HDFS-5413.2.patch
>
>
> The hdfs shell script supports passthrough to calling any arbitrary class if 
> the first argument is not one of the per-defined sub-commands.  The 
> equivalent cmd script does not implement this and instead fails trying to do 
> a labeled goto to the first argument.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5432) TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.

2013-10-28 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807004#comment-13807004
 ] 

Arpit Agarwal commented on HDFS-5432:
-

+1 for the patch.

Verified results with and without your patch on Windows.

> TestDatanodeJsp fails on Windows due to assumption that loopback address 
> resolves to host name localhost.
> -
>
> Key: HDFS-5432
> URL: https://issues.apache.org/jira/browse/HDFS-5432
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Trivial
> Attachments: HDFS-5432.1.patch
>
>
> As discussed in many previous issues, Windows differs from Unixes in that it 
> does not resolve the loopback address to hostname "localhost".  Instead, the 
> host name remains unresolved as "127.0.0.1".  {{TestDatanodeJsp}} fails on 
> Windows, because it attempts to assert a string match containing "localhost" 
> as the host name.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5436:
-

Attachment: (was: HDFS-5436.000.patch)

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5436:
-

Attachment: HDFS-5436.000.patch

There's no functionality changes in this patch.

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5436:
-

Status: Patch Available  (was: Open)

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5436:
-

Attachment: HDFS-5436.000.patch

> Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
> 
>
> Key: HDFS-5436
> URL: https://issues.apache.org/jira/browse/HDFS-5436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5436.000.patch
>
>
> Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
> different packages. This force several methods in ByteInputStream and 
> URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web

2013-10-28 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5436:


 Summary: Move HsFtpFileSystem and HFtpFileSystem into 
org.apache.hdfs.web
 Key: HDFS-5436
 URL: https://issues.apache.org/jira/browse/HDFS-5436
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in 
different packages. This force several methods in ByteInputStream and 
URLConnectionFactory to be public methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5435) File append fails to initialize storageIDs

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5435:


Release Note:   (was: Thanks for the patch Junping. I have committed it to 
branch HDFS-2832.)

> File append fails to initialize storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: Heterogeneous Storage (HDFS-2832)
>
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-5435) File append fails to initialize storageIDs

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-5435.
-

   Resolution: Fixed
Fix Version/s: Heterogeneous Storage (HDFS-2832)
 Release Note: Thanks for the patch Junping. I have committed it to branch 
HDFS-2832.

> File append fails to initialize storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: Heterogeneous Storage (HDFS-2832)
>
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5435) File append fails to initialize storageIDs

2013-10-28 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806968#comment-13806968
 ] 

Arpit Agarwal commented on HDFS-5435:
-

Thanks for the patch Junping. I have committed it to branch HDFS-2832.

> File append fails to initialize storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: Heterogeneous Storage (HDFS-2832)
>
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5435) File append fails to initialize storageIDs

2013-10-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5435:


Summary: File append fails to initialize storageIDs  (was: Fix file append 
without setting storageIDs)

> File append fails to initialize storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5435) Fix file append without setting storageIDs

2013-10-28 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806959#comment-13806959
 ] 

Arpit Agarwal commented on HDFS-5435:
-

+1 for the patch, I will commit it shortly.

> Fix file append without setting storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5435) Fix file append without setting storageIDs

2013-10-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806923#comment-13806923
 ] 

Junping Du commented on HDFS-5435:
--

Verify it fix several failures in append related unit test.

> Fix file append without setting storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5435) Fix file append without setting storageIDs

2013-10-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-5435:
-

Attachment: HDFS-5435.patch

> Fix file append without setting storageIDs
> --
>
> Key: HDFS-5435
> URL: https://issues.apache.org/jira/browse/HDFS-5435
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: HDFS-5435.patch
>
>
> Several NPE exceptions in append related operations is because forget setting 
> storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5435) Fix file append without setting storageIDs

2013-10-28 Thread Junping Du (JIRA)
Junping Du created HDFS-5435:


 Summary: Fix file append without setting storageIDs
 Key: HDFS-5435
 URL: https://issues.apache.org/jira/browse/HDFS-5435
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Junping Du
Assignee: Junping Du


Several NPE exceptions in append related operations is because forget setting 
storageIDs in initiate DataStreamer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1

2013-10-28 Thread Eric Sirianni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5434:


Description: 
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
{{dfs.namenode.minPipelineSize}}.

This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
ensuring that the pipeline size passed to 
{{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:

{code}
max(replication, ${dfs.namenode.minPipelineSize})
{code}


  was:
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
{{dfs.namenode.minPipelineSize}}.

This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
ensuring that the pipeline size passed to 
{{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:

{code}
max(replication, ${dfs.namenode.minPipelineSize})
{code}



> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the other data node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1

2013-10-28 Thread Eric Sirianni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5434:


Description: 
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the surviving data 
node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
{{dfs.namenode.minPipelineSize}}.

This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
ensuring that the pipeline size passed to 
{{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:

{code}
max(replication, ${dfs.namenode.minPipelineSize})
{code}


  was:
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
{{dfs.namenode.minPipelineSize}}.

This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
ensuring that the pipeline size passed to 
{{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:

{code}
max(replication, ${dfs.namenode.minPipelineSize})
{code}



> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the surviving data 
> node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1

2013-10-28 Thread Eric Sirianni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5434:


Description: 
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
{{dfs.namenode.minPipelineSize}}.

This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
ensuring that the pipeline size passed to 
{{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:

{code}
max(replication, ${dfs.namenode.minPipelineSize})
{code}


  was:
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
dfs.namenode.minPipelineSize.

This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring 
that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the 
replication parameter is at least:

{code:java}
max(replication, ${dfs.namenode.minPipelineSize})
{code}



> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the other data node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1

2013-10-28 Thread Buddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buddy updated HDFS-5434:


Description: 
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
dfs.namenode.minPipelineSize.

This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring 
that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the 
replication parameter is at least:

{code:java}
max(replication, ${dfs.namenode.minPipelineSize})
{code}


  was:
If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
dfs.namenode.minPipelineSize.

This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring 
that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the 
replication parameter is at least:

max(replication, ${dfs.namenode.minPipelineSize})




> Write resiliency for replica count 1
> 
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Buddy
>Priority: Minor
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the other data node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> dfs.namenode.minPipelineSize.
> This behavior can be implemented in FSNameSystem.getAdditionalBlock by 
> ensuring that the pipeline size passed to BlockPlacementPolicy.chooseTarget 
> in the replication parameter is at least:
> {code:java}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5434) Write resiliency for replica count 1

2013-10-28 Thread Buddy (JIRA)
Buddy created HDFS-5434:
---

 Summary: Write resiliency for replica count 1
 Key: HDFS-5434
 URL: https://issues.apache.org/jira/browse/HDFS-5434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Buddy
Priority: Minor


If a file has a replica count of one, the HDFS client is exposed to write 
failures if the data node fails during a write. With a pipeline of size of one, 
no recovery is possible if the sole data node dies.

A simple fix is to force a minimum pipeline size of 2, while leaving the 
replication count as 1. The implementation for this is fairly non-invasive.

Although the replica count is one, the block will be written to two data nodes 
instead of one. If one of the data nodes fails during the write, normal 
pipeline recovery will ensure that the write succeeds to the other data node.

The existing code in the name node will prune the extra replica when it 
receives the block received reports for the finalized block from both data 
nodes. This results in the intended replica count of one for the block.

This behavior should be controlled by a configuration option such as 
dfs.namenode.minPipelineSize.

This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring 
that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the 
replication parameter is at least:

max(replication, ${dfs.namenode.minPipelineSize})





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5215) dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space

2013-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806756#comment-13806756
 ] 

Hadoop QA commented on HDFS-5215:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12610538/HDFS-5215.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5291//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5291//console

This message is automatically generated.

> dfs.datanode.du.reserved is not taking effect as it's not considered while 
> getting the available space
> --
>
> Key: HDFS-5215
> URL: https://issues.apache.org/jira/browse/HDFS-5215
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 3.0.0
>
> Attachments: HDFS-5215.patch
>
>
> {code}public long getAvailable() throws IOException {
> long remaining = getCapacity()-getDfsUsed();
> long available = usage.getAvailable();
> if (remaining > available) {
>   remaining = available;
> }
> return (remaining > 0) ? remaining : 0;
>   } 
> {code}
> Here we are not considering the reserved space while getting the Available 
> Space.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5215) dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space

2013-10-28 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-5215:
---

Attachment: HDFS-5215.patch

> dfs.datanode.du.reserved is not taking effect as it's not considered while 
> getting the available space
> --
>
> Key: HDFS-5215
> URL: https://issues.apache.org/jira/browse/HDFS-5215
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 3.0.0
>
> Attachments: HDFS-5215.patch
>
>
> {code}public long getAvailable() throws IOException {
> long remaining = getCapacity()-getDfsUsed();
> long available = usage.getAvailable();
> if (remaining > available) {
>   remaining = available;
> }
> return (remaining > 0) ? remaining : 0;
>   } 
> {code}
> Here we are not considering the reserved space while getting the Available 
> Space.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   >