[jira] [Updated] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure

2013-12-17 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-4273:


Attachment: HDFS-4273.v6.patch

Update patch because of recent commit conflict, changes:
Change getStorageID to getDatanodeUuid
[~umamaheswararao] if you are reviewing please use the latest patch.

> Problem in DFSInputStream read retry logic may cause early failure
> --
>
> Key: HDFS-4273
> URL: https://issues.apache.org/jira/browse/HDFS-4273
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Minor
> Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
> HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, 
> TestDFSInputStream.java
>
>
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>  -> reader.doRead()
>  -> seekToNewSource() add currentNode to deadnode, wish to get a 
> different datanode
> -> blockSeekTo()
>-> chooseDataNode()
>   -> block missing, clear deadNodes and pick the currentNode again
> seekToNewSource() return false
>  readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before 
> tried MaxBlockAcquireFailures.
> {noformat} 
> some issues of the logic:
> 1. seekToNewSource() logic is broken because it may clear deadNodes in the 
> middle.
> 2. the variable "int retries=2" in readWithStrategy seems have conflict with 
> MaxBlockAcquireFailures, should it be removed?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5676:
---

Attachment: (was: HDFS-5676.002.patch)

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851485#comment-13851485
 ] 

Colin Patrick McCabe commented on HDFS-5676:


actually, looking at this again, I'm not sure {{synchronized}} would work here. 
 I don't think it would be good for the worker thread to block on the 
DFSOutputStream lock.  I think either an atomic reference or another lock 
protecting just the cachingstrategy is the way to go here.

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch, HDFS-5676.002.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851471#comment-13851471
 ] 

Konstantin Shvachko commented on HDFS-5675:
---

"mkdirsr" is a typo. OP_MKDIRS_USAGE is sticking out beyond 80 width. And just 
saw the comment "Do file create" is irrelevant for mkdirs.

> Add Mkdirs operation to NNThroughputBenchmark
> -
>
> Key: HDFS-5675
> URL: https://issues.apache.org/jira/browse/HDFS-5675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Attachments: mkdirsBenchmarkPatchTrunk.patch, 
> mkdirsBenchmarkPatchTrunk_2.patch
>
>
> I did some work to extend NNThroughputBenchmark that I would like to 
> contribute to the community. It is pretty straightforward; just adding a 
> Mkdir operation to the test in order to see the operations per second of a 
> multiple 'mkdir' commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851424#comment-13851424
 ] 

Hadoop QA commented on HDFS-5574:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619250/HDFS-5574.v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5755//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5755//console

This message is automatically generated.

> Remove buffer copy in BlockReader.skip
> --
>
> Key: HDFS-5574
> URL: https://issues.apache.org/jira/browse/HDFS-5574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Trivial
> Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch, 
> HDFS-5574.v3.patch
>
>
> BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
> data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5662) Can't decommission a DataNode due to file's replication factor larger than the rest of the cluster size

2013-12-17 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851408#comment-13851408
 ] 

Arpit Agarwal commented on HDFS-5662:
-

Minor comment - contents of the {{if}} block in BlockManager.java have an extra 
indentation.

+1 otherwise.


> Can't decommission a DataNode due to file's replication factor larger than 
> the rest of the cluster size
> ---
>
> Key: HDFS-5662
> URL: https://issues.apache.org/jira/browse/HDFS-5662
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5662.001.patch
>
>
> A datanode can't be decommissioned if it has replica that belongs to a file 
> with a replication factor larger than the rest of the cluster size.
> One way to fix this is to have some kind of minimum replication factor 
> setting and thus any datanode can be decommissioned regardless of the largest 
> replication factor it's related to. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851388#comment-13851388
 ] 

Hadoop QA commented on HDFS-5676:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619237/HDFS-5676.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5754//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5754//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5754//console

This message is automatically generated.

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch, HDFS-5676.002.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-5675:
---

Attachment: mkdirsBenchmarkPatchTrunk_2.patch

New patch addressing Konstantin's comments.

> Add Mkdirs operation to NNThroughputBenchmark
> -
>
> Key: HDFS-5675
> URL: https://issues.apache.org/jira/browse/HDFS-5675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Attachments: mkdirsBenchmarkPatchTrunk.patch, 
> mkdirsBenchmarkPatchTrunk_2.patch
>
>
> I did some work to extend NNThroughputBenchmark that I would like to 
> contribute to the community. It is pretty straightforward; just adding a 
> Mkdir operation to the test in order to see the operations per second of a 
> multiple 'mkdir' commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5662) Can't decommission a DataNode due to file's replication factor larger than the rest of the cluster size

2013-12-17 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851375#comment-13851375
 ] 

Brandon Li commented on HDFS-5662:
--

The patch basically allows datanode to be decommissioned as long as either its 
block replication factor or default replication factor is satisfied. The 
replication tasks are still created for those under-replicated blocks on the 
decommissioning datanode. 

A unit test is added to validate the fix.

> Can't decommission a DataNode due to file's replication factor larger than 
> the rest of the cluster size
> ---
>
> Key: HDFS-5662
> URL: https://issues.apache.org/jira/browse/HDFS-5662
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5662.001.patch
>
>
> A datanode can't be decommissioned if it has replica that belongs to a file 
> with a replication factor larger than the rest of the cluster size.
> One way to fix this is to have some kind of minimum replication factor 
> setting and thus any datanode can be decommissioned regardless of the largest 
> replication factor it's related to. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-5675:
---

Target Version/s: 3.0.0

> Add Mkdirs operation to NNThroughputBenchmark
> -
>
> Key: HDFS-5675
> URL: https://issues.apache.org/jira/browse/HDFS-5675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Attachments: mkdirsBenchmarkPatchTrunk.patch
>
>
> I did some work to extend NNThroughputBenchmark that I would like to 
> contribute to the community. It is pretty straightforward; just adding a 
> Mkdir operation to the test in order to see the operations per second of a 
> multiple 'mkdir' commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-5675:
---

Fix Version/s: (was: 3.0.0)

> Add Mkdirs operation to NNThroughputBenchmark
> -
>
> Key: HDFS-5675
> URL: https://issues.apache.org/jira/browse/HDFS-5675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Attachments: mkdirsBenchmarkPatchTrunk.patch
>
>
> I did some work to extend NNThroughputBenchmark that I would like to 
> contribute to the community. It is pretty straightforward; just adding a 
> Mkdir operation to the test in order to see the operations per second of a 
> multiple 'mkdir' commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851362#comment-13851362
 ] 

Hadoop QA commented on HDFS-5653:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619224/HDFS-5653.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5753//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5753//console

This message is automatically generated.

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch, 
> HDFS-5653.002.patch, HDFS-5653.003.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5662) Can't decommission a DataNode due to file's replication factor larger than the rest of the cluster size

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851342#comment-13851342
 ] 

Hadoop QA commented on HDFS-5662:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619219/HDFS-5662.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5752//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5752//console

This message is automatically generated.

> Can't decommission a DataNode due to file's replication factor larger than 
> the rest of the cluster size
> ---
>
> Key: HDFS-5662
> URL: https://issues.apache.org/jira/browse/HDFS-5662
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5662.001.patch
>
>
> A datanode can't be decommissioned if it has replica that belongs to a file 
> with a replication factor larger than the rest of the cluster size.
> One way to fix this is to have some kind of minimum replication factor 
> setting and thus any datanode can be decommissioned regardless of the largest 
> replication factor it's related to. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip

2013-12-17 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5574:


Attachment: HDFS-5574.v3.patch

Thanks for the comments, Colin. The new v3 patch just remove changes of 
BlockReaderLocal in v2 patch. Since RemoteBlockReader is deprecated, I keep it 
unchanged. 
Another change is add synchronized to RemoteBlockReader2.read(ByteBuffer buf),  
there is a findbug warning.
Add a test to test DFSInpustream.skip which will call BlockReader.skip. 

> Remove buffer copy in BlockReader.skip
> --
>
> Key: HDFS-5574
> URL: https://issues.apache.org/jira/browse/HDFS-5574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Trivial
> Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch, 
> HDFS-5574.v3.patch
>
>
> BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
> data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851322#comment-13851322
 ] 

Hadoop QA commented on HDFS-5579:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12617130/HDFS-5579.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5750//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5750//console

This message is automatically generated.

> Under construction files make DataNode decommission take very long hours
> 
>
> Key: HDFS-5579
> URL: https://issues.apache.org/jira/browse/HDFS-5579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch
>
>
> We noticed that some times decommission DataNodes takes very long time, even 
> exceeds 100 hours.
> After check the code, I found that in 
> BlockManager:computeReplicationWorkForBlocks(List> 
> blocksToReplicate) it won't replicate blocks which belongs to under 
> construction files, however in 
> BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there  
> is block need replicate no matter whether it belongs to under construction or 
> not, the decommission progress will continue running.
> That's the reason some time the decommission takes very long time.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5674) Editlog code cleanup

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851309#comment-13851309
 ] 

Hudson commented on HDFS-5674:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4907 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4907/])
HDFS-5674. Editlog code cleanup: remove @SuppressWarnings("deprecation") in 
FSEditLogOp; change FSEditLogOpCodes.fromByte(..) to be more efficient; and 
change Some fields in FSEditLog to final. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551812)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java


> Editlog code cleanup
> 
>
> Key: HDFS-5674
> URL: https://issues.apache.org/jira/browse/HDFS-5674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: h5674_20131217.patch, h5674_20131218.patch
>
>
> A few minor improvements:
> - \@SuppressWarnings("deprecation") in FSEditLogOp can be removed.
> - FSEditLogOpCodes.fromByte(..) can be more efficient.
> - Some fields in FSEditLog can be final.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851300#comment-13851300
 ] 

Hadoop QA commented on HDFS-5676:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619204/HDFS-5676.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5749//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5749//console

This message is automatically generated.

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch, HDFS-5676.002.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5674) Editlog code cleanup

2013-12-17 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5674:
-

   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I have committed this.

> Editlog code cleanup
> 
>
> Key: HDFS-5674
> URL: https://issues.apache.org/jira/browse/HDFS-5674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: h5674_20131217.patch, h5674_20131218.patch
>
>
> A few minor improvements:
> - \@SuppressWarnings("deprecation") in FSEditLogOp can be removed.
> - FSEditLogOpCodes.fromByte(..) can be more efficient.
> - Some fields in FSEditLog can be final.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5676:
---

Attachment: HDFS-5676.002.patch

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch, HDFS-5676.002.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851292#comment-13851292
 ] 

Colin Patrick McCabe commented on HDFS-5676:


The issue is more that in createBlockOutputStream, there is no synchronized 
block, so there's no obvious place to put a new one.  But there was an atomic 
boolean, so this seemed like the obvious pattern to follow.  Anyway, I can make 
it synchronized.

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5674) Editlog code cleanup

2013-12-17 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5674:
-

Attachment: h5674_20131218.patch

Thanks Jing for reviewing this.

h5674_20131218.patch: add a comment for OP_INVALID.

> Editlog code cleanup
> 
>
> Key: HDFS-5674
> URL: https://issues.apache.org/jira/browse/HDFS-5674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h5674_20131217.patch, h5674_20131218.patch
>
>
> A few minor improvements:
> - \@SuppressWarnings("deprecation") in FSEditLogOp can be removed.
> - FSEditLogOpCodes.fromByte(..) can be more efficient.
> - Some fields in FSEditLog can be final.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5453) Support fine grain locking in FSNamesystem

2013-12-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851274#comment-13851274
 ] 

Konstantin Shvachko commented on HDFS-5453:
---

Daryn, just wanted to remind an issue HADOOP-3860, where we decided that 
fine-grained locking is not worth pursuing some years ago. [See summary 
here|https://issues.apache.org/jira/browse/HADOOP-3860?focusedCommentId=12618038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12618038]
There are three reasons:
# Performance gains are minimal. ~2-15% is almost within the measurement 
threshold, and write operations (those using the write lock) are IO bound, not 
cpu, because of journalling.
# Increased code complexity: more locks means more errors.
# The actual NameNode load on a busy cluster never reaches its maximum 
execution power. The load used to be two orders of magnitude lower than NN can 
handle.

Things changed since then in performance numbers and use cases, but the 
arguments may still be valid.
Checking peak and average load on one of your clusters could answer some of 
such questions.

> Support fine grain locking in FSNamesystem
> --
>
> Key: HDFS-5453
> URL: https://issues.apache.org/jira/browse/HDFS-5453
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The namesystem currently uses a course grain lock to control access.  This 
> prevents concurrent writers in different branches of the tree, and prevents 
> readers from accessing branches that writers aren't using.
> Features that introduce latency to namesystem operations, such as cold 
> storage of inodes, will need fine grain locking to avoid degrading the entire 
> namesystem's throughput.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5653:
-

Attachment: HDFS-5653.003.patch

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch, 
> HDFS-5653.002.patch, HDFS-5653.003.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2013-12-17 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851219#comment-13851219
 ] 

Jerry Chen commented on HDFS-5442:
--

{quote}It might be good to break up the work into two major features.{quote}
Logically, yes. And just as you mentioned, the user will have the flexibility 
to choose between sync or async features based on their needs. On the other 
hand, in design perspective, the two features share some common concepts and 
facilities, and serves common requirement of cross datacenter replication. We 
also see the needs of sync replication and async replication to be used at the 
same time and complement each other for different data characteristics.

{quote}There seems to be assumption of replication of entire namespace at few 
places. This might not be desirable in many cases. Enabling this feature per 
directory or list of directories would be very useful.{quote}
As the namespace replication is based on namespace journaling, to replicate the 
entire namespace is in concept straightforward and simple. Per list of 
directories namespace replication does can be done by filtering, but that would 
complex the whole thing as we know that the edit logs for a directory doesn’t 
form a closure in namespace journaling. On the other hand, the data plays a 
critical role for cross datacenter replication. User can configure a list of 
directories for synchronous replication of data and other directories data will 
be replicated asynchronously. We will target the entire namespace replication 
in the phase-1 work and can consider this in phase-2 work when we understanding 
the exact impact for partial namespace replication. 
{quote}There seems to be assumption of primary cluster and secondary cluster. 
Can this be chained to having something A->B and B->C. Or even the use case of 
A->B or B->A. Calling out those with configuration options would be very useful 
for cluster admins.{quote}
In the design, secondary and primary cluster are operating differently. To 
support chain like A->B and B->C, a secondary cluster should act as a primary 
cluster for C. This needs extra work specific for chaining to be done. I would 
suggest consider this as future improvement. When we talk about chain cluster, 
I would tend to consider it as asynchronous replication. This would simplify 
things a little. While reverse/switch the primary and secondary cluster role is 
supported but this doesn’t mean two way replication at the same time.
{quote}Another place which would need more information is about primary cluster 
NN tracking datanode information from secondary cluster (via secondary cluster 
NN). This needs to be thought to see if this is really scalable.{quote}
We should assume that the part of datanode information tracked by primary 
cluster is kept as minimum. And this information is updated in batch via 
secondary cluster NN. In network communication, our goal is to send the 
secondary cluster details when there is really a change in DN state and batches 
wise.  For example, DN expires with secondary cluster or DN space completely 
filled and cannot write any new data to it, that time we report this DNs. We 
skip reporting DNs which are already registered and they are still qualifies 
for writes. Let’s communicate by using patches as to other details of “how to”.
{quote}How would ReplicationManager or changing replication of files work in 
general with this policy?{quote}
In the high level, we would assume the original replication in each local 
cluster is still working as it was. The concept of the original replication 
number is applied to the local blocks only. The added part is remote block 
replication which is triggered by secondary cluster NameNode. 

> Zero loss HDFS data replication for multiple datacenters
> 
>
> Key: HDFS-5442
> URL: https://issues.apache.org/jira/browse/HDFS-5442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Avik Dey
> Attachments: Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. F

[jira] [Commented] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851220#comment-13851220
 ] 

Hadoop QA commented on HDFS-5653:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619218/HDFS-5653.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5751//console

This message is automatically generated.

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch, 
> HDFS-5653.002.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5662) Can't decommission a DataNode due to file's replication factor larger than the rest of the cluster size

2013-12-17 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5662:
-

Status: Patch Available  (was: Open)

> Can't decommission a DataNode due to file's replication factor larger than 
> the rest of the cluster size
> ---
>
> Key: HDFS-5662
> URL: https://issues.apache.org/jira/browse/HDFS-5662
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5662.001.patch
>
>
> A datanode can't be decommissioned if it has replica that belongs to a file 
> with a replication factor larger than the rest of the cluster size.
> One way to fix this is to have some kind of minimum replication factor 
> setting and thus any datanode can be decommissioned regardless of the largest 
> replication factor it's related to. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5662) Can't decommission a DataNode due to file's replication factor larger than the rest of the cluster size

2013-12-17 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5662:
-

Attachment: HDFS-5662.001.patch

> Can't decommission a DataNode due to file's replication factor larger than 
> the rest of the cluster size
> ---
>
> Key: HDFS-5662
> URL: https://issues.apache.org/jira/browse/HDFS-5662
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5662.001.patch
>
>
> A datanode can't be decommissioned if it has replica that belongs to a file 
> with a replication factor larger than the rest of the cluster size.
> One way to fix this is to have some kind of minimum replication factor 
> setting and thus any datanode can be decommissioned regardless of the largest 
> replication factor it's related to. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5653:
-

Attachment: HDFS-5653.002.patch

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch, 
> HDFS-5653.002.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5677) Need error checking for HA cluster configuration

2013-12-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851213#comment-13851213
 ] 

Jing Zhao commented on HDFS-5677:
-

I also think this HA configuration error checking will be very useful. 
[~vsheffer], do you want to contribute a patch on this?

> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Priority: Minor
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> {code}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851211#comment-13851211
 ] 

Konstantin Shvachko commented on HDFS-5675:
---

mkdirs is indeed an interesting operation for benchmarking as one of the 
simplest modification of the namespace.
Comments on the patch:
# The operation should be called mkdirs not mkdir. And the Stats class should 
be {{MkdirsStats}}, no "File". Don't forget to update op name and usage 
constants.
# You are now creating a flat collection of directories. Would be good to 
support multi-level structure, same as {{-filesPerDir}} in {{CreateFileStats}}. 
Too many entries in one directory can be a performance issues by itself, so it 
is good to have flexibility.

Also, please adjust jira fields.

> Add Mkdirs operation to NNThroughputBenchmark
> -
>
> Key: HDFS-5675
> URL: https://issues.apache.org/jira/browse/HDFS-5675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: mkdirsBenchmarkPatchTrunk.patch
>
>
> I did some work to extend NNThroughputBenchmark that I would like to 
> contribute to the community. It is pretty straightforward; just adding a 
> Mkdir operation to the test in order to see the operations per second of a 
> multiple 'mkdir' commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2013-12-17 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851209#comment-13851209
 ] 

Jerry Chen commented on HDFS-5442:
--

{quote}There are two clusters in your design document: the primary cluster and 
the secondary cluster. I think we only need one cluster.{quote}
We think it is important to have clear communication and collaboration boundary 
between the regions (datacenters) for the following reasons:

1.   When one datacenter fails, another datacenter should take over with a 
symmetric HA cluster, rather than leave a single cluster with reduced 
resources. 

2.   With a single cluster approach, the impact to the existing HDFS 
deployment concept is huge. A HDFS cluster is no longer one Active NameNode and 
Standby NameNode. It will span multiple “regions” and with “two Standby 
NameNode in each region”. And also the DataNodes are split into regions and the 
block locations are not shared between NameNodes of different regions, but they 
belong to a single HDFS cluster. These concept changes will further impact more 
on existing Hadoop operations and tooling. 

3.   With a single cluster approach, straightforwardly, operations must 
manage all the nodes across datacenters. This may cause unnecessary cross site 
communications. And also, it loses the flexibility of managing each datacenter 
separately. 

4.   We avoid larger questions such as how upper level components such as 
HBase and YARN could be deployed and run over a single HDFS system across 
multiple sites.

And as to QJM, although it can in theory span datacenters, the peers in the 
backup datacenters could easily be out of date when the primary fails. This is 
because only a majority needs to agree without consideration for location. 


> Zero loss HDFS data replication for multiple datacenters
> 
>
> Key: HDFS-5442
> URL: https://issues.apache.org/jira/browse/HDFS-5442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Avik Dey
> Attachments: Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5242) Reduce contention on DatanodeInfo instances

2013-12-17 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851208#comment-13851208
 ] 

Daryn Sharp commented on HDFS-5242:
---

For the impact, it's been so long I don't have numbers handy.  I was profiling 
a 85 node cluster being hammered by s-live and {{getNetworkLocation}} was 
showing up as a hot spot.

> Reduce contention on DatanodeInfo instances
> ---
>
> Key: HDFS-5242
> URL: https://issues.apache.org/jira/browse/HDFS-5242
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-5242.patch
>
>
> Synchronization in {{DatanodeInfo}} instances causes unnecessary contention 
> between call handlers.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5653:


Assignee: Haohui Mai

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5579) Under construction files make DataNode decommission take very long hours

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5579:


Status: Patch Available  (was: Open)

> Under construction files make DataNode decommission take very long hours
> 
>
> Key: HDFS-5579
> URL: https://issues.apache.org/jira/browse/HDFS-5579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0, 1.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch
>
>
> We noticed that some times decommission DataNodes takes very long time, even 
> exceeds 100 hours.
> After check the code, I found that in 
> BlockManager:computeReplicationWorkForBlocks(List> 
> blocksToReplicate) it won't replicate blocks which belongs to under 
> construction files, however in 
> BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there  
> is block need replicate no matter whether it belongs to under construction or 
> not, the decommission progress will continue running.
> That's the reason some time the decommission takes very long time.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5242) Reduce contention on DatanodeInfo instances

2013-12-17 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851206#comment-13851206
 ] 

Daryn Sharp commented on HDFS-5242:
---

An atomic reference would be fine but I'm not sure it adds value.  The setter 
is called when adding the node to the network topology - after which, it is 
never changed.  Another thread can't get the instance w/o going through the 
topology, and the instance isn't in the topology until after the network 
location is set.

> Reduce contention on DatanodeInfo instances
> ---
>
> Key: HDFS-5242
> URL: https://issues.apache.org/jira/browse/HDFS-5242
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-5242.patch
>
>
> Synchronization in {{DatanodeInfo}} instances causes unnecessary contention 
> between call handlers.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration

2013-12-17 Thread Vincent Sheffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Description: 
If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
*dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message 
is provided to indicate that.

The only indication of a problem is a log message like the following:

{code}
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
server: myCluster:8020
{code}

Another way to look at this is that no error or warning is provided when a 
servicerpc-address/rpc-address property is defined for a node without a 
corresponding node declared in *dfs.ha.namenodes.myCluster*.

This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
one of my node names.  It would be very helpful to have at least a warning 
message on startup if there is a configuration problem like this.

  was:
If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
*dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message 
is provided to indicate that.

The only indication of a problem is a log message like the following:

{{
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
server: myCluster:8020
}}

Another way to look at this is that no error or warning is provided when a 
servicerpc-address/rpc-address property is defined for a node without a 
corresponding node declared in *dfs.ha.namenodes.myCluster*.

This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
one of my node names.  It would be very helpful to have at least a warning 
message on startup if there is a configuration problem like this.


> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Priority: Minor
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> {code}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5677) Need error checking for HA cluster configuration

2013-12-17 Thread Vincent Sheffer (JIRA)
Vincent Sheffer created HDFS-5677:
-

 Summary: Need error checking for HA cluster configuration
 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Priority: Minor


If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
defined in subsequent dfs.namenode.servicerpc-address.myCluster.nodename or 
dfs.namenode.rpc-address.myCluster.XXX properties no error or warning message 
is provided to indicate that.

The only indication of a problem is a log message like the following:

{{
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
server: myCluster:8020
}}

Another way to look at this is that no error or warning is provided when a 
servicerpc-address/rpc-address property is defined for a node without a 
corresponding node declared in *dfs.ha.namenodes.myCluster*.

This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
one of my node names.  It would be very helpful to have at least a warning 
message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration

2013-12-17 Thread Vincent Sheffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Sheffer updated HDFS-5677:
--

Description: 
If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
*dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning message 
is provided to indicate that.

The only indication of a problem is a log message like the following:

{{
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
server: myCluster:8020
}}

Another way to look at this is that no error or warning is provided when a 
servicerpc-address/rpc-address property is defined for a node without a 
corresponding node declared in *dfs.ha.namenodes.myCluster*.

This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
one of my node names.  It would be very helpful to have at least a warning 
message on startup if there is a configuration problem like this.

  was:
If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
defined in subsequent dfs.namenode.servicerpc-address.myCluster.nodename or 
dfs.namenode.rpc-address.myCluster.XXX properties no error or warning message 
is provided to indicate that.

The only indication of a problem is a log message like the following:

{{
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
server: myCluster:8020
}}

Another way to look at this is that no error or warning is provided when a 
servicerpc-address/rpc-address property is defined for a node without a 
corresponding node declared in *dfs.ha.namenodes.myCluster*.

This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
one of my node names.  It would be very helpful to have at least a warning 
message on startup if there is a configuration problem like this.


> Need error checking for HA cluster configuration
> 
>
> Key: HDFS-5677
> URL: https://issues.apache.org/jira/browse/HDFS-5677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ha
>Affects Versions: 2.0.6-alpha
> Environment: centos6.5, oracle jdk6 45, 
>Reporter: Vincent Sheffer
>Priority: Minor
>
> If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
> defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
> *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
> message is provided to indicate that.
> The only indication of a problem is a log message like the following:
> {{
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
> server: myCluster:8020
> }}
> Another way to look at this is that no error or warning is provided when a 
> servicerpc-address/rpc-address property is defined for a node without a 
> corresponding node declared in *dfs.ha.namenodes.myCluster*.
> This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
> one of my node names.  It would be very helpful to have at least a warning 
> message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851189#comment-13851189
 ] 

Andrew Wang commented on HDFS-5676:
---

Makes sense, good catch Colin. One comment and one question:

* Need to add caching strategy to method javadoc at DFSInputStream:1062
* Any reason you went with an AtomicReference swap for DFSOutputStream rather 
than just using synchronized? I'd prefer synchronized if it's the same to you, 
simpler and probably the same perf wise since this is unlikely to be contended.

+1 once addressed and pending Jenkins.

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Work started] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5676 started by Colin Patrick McCabe.

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5676:
---

Status: Patch Available  (was: In Progress)

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5676:
---

Attachment: HDFS-5676.001.patch

also make CachingStrategy immutable.  this makes it easier to reason about the 
synchronization

> fix inconsistent synchronization of CachingStrategy
> ---
>
> Key: HDFS-5676
> URL: https://issues.apache.org/jira/browse/HDFS-5676
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-5676.001.patch
>
>
> Currently, the synchronization for {{CachingStrategy}} is a little 
> inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, 
> but there's nothing protecting that object against concurrent use in 
> {{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
> synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5676) fix inconsistent synchronization of CachingStrategy

2013-12-17 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-5676:
--

 Summary: fix inconsistent synchronization of CachingStrategy
 Key: HDFS-5676
 URL: https://issues.apache.org/jira/browse/HDFS-5676
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor


Currently, the synchronization for {{CachingStrategy}} is a little 
inconsistent.  DFSOutputStream#setDropBehind modifies the strategy object, but 
there's nothing protecting that object against concurrent use in 
{{createBlockOutputStream}}.  Similarly, {{DFSInputStream#setDropBehind}} is 
synchronized, but not all the uses of {{cachingStrategy}} are.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5673) Implement logic for modification of ACLs.

2013-12-17 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5673:


Attachment: HDFS-5673.1.patch

I'm attaching the patch that implements the core ACL modification logic.  This 
turned out to be some fairly tricky logic.  It's probably the most complex 
logic in the HDFS ACLs project (certainly a lot more complex than the 
permission enforcement is going to be).

{{Acl}} - This class is back.  This time, it's private to HDFS, intended for 
use inside the namenode.  It's mostly the same as the old version that we used 
to have in Common before the API changes, but with the addition of a few more 
helper methods.

{{AclTransformation}} - This class defines the 5 different operations that can 
change an ACL.  There are quite a few edge cases to consider related to 
validation rules, mask calculation and inferring default entries if not 
specified.  I needed multiple readings of the POSIX ACL docs to fully 
understand all of the edge cases.

http://users.suse.com/~agruen/acl/linux-acls/online/

{{TestAclTransformation}} - Approximately 2/3 of this patch is tests.  For 
every test in this suite, I've run the same scenario against Linux 
setfacl/getfacl to confirm that the {{AclTransformation}} code yields the same 
result.  I ran a coverage report, and I saw 100% coverage of the new code in 
this patch.


> Implement logic for modification of ACLs.
> -
>
> Key: HDFS-5673
> URL: https://issues.apache.org/jira/browse/HDFS-5673
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5673.1.patch
>
>
> This patch will include the core logic for modification of ACLs.  This 
> includes support for all user-facing APIs that modify ACLs.  This will cover 
> access ACLs, default ACLs, automatic mask calculations, automatic inference 
> of unprovided default ACL entries, and validation to prevent creation of an 
> invalid ACL.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5651) remove dfs.namenode.caching.enabled

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851134#comment-13851134
 ] 

Hadoop QA commented on HDFS-5651:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619167/HDFS-5651.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5747//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5747//console

This message is automatically generated.

> remove dfs.namenode.caching.enabled
> ---
>
> Key: HDFS-5651
> URL: https://issues.apache.org/jira/browse/HDFS-5651
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5651.001.patch, HDFS-5651.002.patch, 
> HDFS-5651.003.patch, HDFS-5651.004.patch
>
>
> We can remove dfs.namenode.caching.enabled and simply always enable caching, 
> similar to how we do with snapshots and other features.  The main overhead is 
> the size of the cachedBlocks GSet.  However, we can simply make the size of 
> this GSet configurable, and people who don't want caching can set it to a 
> very small value.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851127#comment-13851127
 ] 

Hadoop QA commented on HDFS-5653:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619197/HDFS-5653.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5748//console

This message is automatically generated.

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Priority: Minor
> Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5653:
-

Attachment: HDFS-5653.001.patch

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Priority: Minor
> Attachments: HDFS-5653.000.patch, HDFS-5653.001.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851112#comment-13851112
 ] 

Hadoop QA commented on HDFS-5653:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619165/HDFS-5653.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5746//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5746//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5746//console

This message is automatically generated.

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Priority: Minor
> Attachments: HDFS-5653.000.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5305) Add https support in HDFS

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5305.
-

   Resolution: Fixed
Fix Version/s: 2.4.0

Yes, we can resolve the jira now. Thanks for the excellent work [~wheat9]!

> Add https support in HDFS
> -
>
> Key: HDFS-5305
> URL: https://issues.apache.org/jira/browse/HDFS-5305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 2.4.0
>
>
> This is the HDFS part of HADOOP-10022. This will serve as the umbrella jira 
> for all the https related cleanup in HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5305) Add https support in HDFS

2013-12-17 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851027#comment-13851027
 ] 

Haohui Mai commented on HDFS-5305:
--

The work has been committed in branch-2. Is it a good time to resolve this jira?

> Add https support in HDFS
> -
>
> Key: HDFS-5305
> URL: https://issues.apache.org/jira/browse/HDFS-5305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>
> This is the HDFS part of HADOOP-10022. This will serve as the umbrella jira 
> for all the https related cleanup in HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5536) Implement HTTP policy for Namenode and DataNode

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850961#comment-13850961
 ] 

Hudson commented on HDFS-5536:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4903 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4903/])
Move HDFS-5538, HDFS-5545, HDFS-5536, HDFS-5312, and HDFS-5629 from trunk to 
2.4.0 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551724)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Implement HTTP policy for Namenode and DataNode
> ---
>
> Key: HDFS-5536
> URL: https://issues.apache.org/jira/browse/HDFS-5536
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5536.000.patch, HDFS-5536.001.patch, 
> HDFS-5536.002.patch, HDFS-5536.003.patch, HDFS-5536.004.patch, 
> HDFS-5536.005.patch, HDFS-5536.006.patch, HDFS-5536.007.patch, 
> HDFS-5536.008.patch, HDFS-5536.009.patch, HDFS-5536.010.patch
>
>
> this jira implements the http and https policy in the namenode and the 
> datanode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5545) Allow specifying endpoints for listeners in HttpServer

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850962#comment-13850962
 ] 

Hudson commented on HDFS-5545:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4903 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4903/])
Move HDFS-5538, HDFS-5545, HDFS-5536, HDFS-5312, and HDFS-5629 from trunk to 
2.4.0 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551724)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Allow specifying endpoints for listeners in HttpServer
> --
>
> Key: HDFS-5545
> URL: https://issues.apache.org/jira/browse/HDFS-5545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5545.000.patch, HDFS-5545.001.patch, 
> HDFS-5545.002.patch, HDFS-5545.003.patch
>
>
> Currently HttpServer listens to HTTP port and provides a method to allow the 
> users to add an SSL listeners after the server starts. This complicates the 
> logic if the client needs to set up HTTP / HTTPS serverfs.
> This jira proposes to replace these two methods with the concepts of listener 
> endpoints. A listener endpoints is a URI (i.e., scheme + host + port) that 
> the HttpServer should listen to. This concept simplifies the task of managing 
> the HTTP server from HDFS / YARN.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5312) Generate HTTP / HTTPS URL in DFSUtil#getInfoServer() based on the configured http policy

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850960#comment-13850960
 ] 

Hudson commented on HDFS-5312:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4903 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4903/])
Move HDFS-5538, HDFS-5545, HDFS-5536, HDFS-5312, and HDFS-5629 from trunk to 
2.4.0 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551724)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Generate HTTP / HTTPS URL in DFSUtil#getInfoServer() based on the configured 
> http policy
> 
>
> Key: HDFS-5312
> URL: https://issues.apache.org/jira/browse/HDFS-5312
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5312.000.patch, HDFS-5312.001.patch, 
> HDFS-5312.002.patch, HDFS-5312.003.patch, HDFS-5312.004.patch, 
> HDFS-5312.005.patch, HDFS-5312.006.patch, HDFS-5312.007.patch, 
> HDFS-5312.008.patch
>
>
> DFSUtil#getInfoServer() returns only the authority (i.e., host+port) when 
> searching for the http / https server. This is insufficient because HDFS-5536 
> and related jiras allows NN / DN / JN to open HTTPS only using the HTTPS_ONLY 
> policy.
> This JIRA addresses two issues. First, DFSUtil#getInfoServer() should return 
> an URI instead of a string, so that the scheme is an inherent parts of the 
> return value, which eliminates the task of figuring out the scheme by design. 
> Second, it introduces a new function to choose whether http or https should 
> be used to connect to the remote server based of dfs.http.policy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5629) Support HTTPS in JournalNode and SecondaryNameNode

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850964#comment-13850964
 ] 

Hudson commented on HDFS-5629:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4903 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4903/])
Move HDFS-5538, HDFS-5545, HDFS-5536, HDFS-5312, and HDFS-5629 from trunk to 
2.4.0 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551724)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Support HTTPS in JournalNode and SecondaryNameNode
> --
>
> Key: HDFS-5629
> URL: https://issues.apache.org/jira/browse/HDFS-5629
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5629.000.patch, HDFS-5629.001.patch, 
> HDFS-5629.002.patch, HDFS-5629.003.patch
>
>
> Currently JournalNode has only HTTP support only. This jira tracks the effort 
> to add HTTPS support into JournalNode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5538) URLConnectionFactory should pick up the SSL related configuration by default

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850965#comment-13850965
 ] 

Hudson commented on HDFS-5538:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4903 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4903/])
Move HDFS-5538, HDFS-5545, HDFS-5536, HDFS-5312, and HDFS-5629 from trunk to 
2.4.0 section. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551724)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> URLConnectionFactory should pick up the SSL related configuration by default
> 
>
> Key: HDFS-5538
> URL: https://issues.apache.org/jira/browse/HDFS-5538
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5538.000.patch, HDFS-5538.001.patch, 
> HDFS-5538.002.patch, HDFS-5538.003.patch
>
>
> The default instance of URLConnectionFactory, DEFAULT_CONNECTION_FACTORY does 
> not pick up any hadoop-specific, SSL-related configuration. Its customers 
> have to set up the ConnectionConfigurator explicitly in order to pick up 
> these configurations. This is less than ideal for HTTPS because whenever the 
> code needs to make a HTTPS connection, the code is forced to go through the 
> set up.
> This jira refactors URLConnectionFactory to ease the handling of HTTPS 
> connections (compared to the DEFAULT_CONNECTION_FACTORY we have right now). 
> In particular, instead of loading the SSL configurator statically in 
> SecurityUtil (based on a global configuration about SSL), and determine 
> whether we should set up SSL for a given connection based on whether the SSL 
> configurator is null, we now load the SSL configurator in 
> URLConnectionFactory, and determine if we need to use the configurator to set 
> up an SSL connection based on if the given URL/connection is https.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5431) support cachepool-based limit management in path-based caching

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850771#comment-13850771
 ] 

Hudson commented on HDFS-5431:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4900 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4900/])
HDFS-5431. Support cachepool-based limit management in path-based caching. 
(awang via cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551651)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/CacheFlag.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/HdfsAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolStats.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CachePool.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageSerialization.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/OfflineEditsViewerHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testCacheAdminConf.xml


> support cachepool-based limit management in path-based caching
> --
>
> Key: HDFS-5431
> URL: https://issues.apache.org/jira/browse/HDFS-5431
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Fix For: 3.0.0
>
> Attachments: hdfs-5431-1.patch, hdfs-5431-2.patch, hdfs-5431-3.patch, 
> hdfs-5431-4.patch, hdfs-5431-5.patch, hdfs-5431-6.patch, hdfs-5431-7.patch
>
>
> We should support cachepool-based limit management in path-based caching.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5661) Browsing FileSystem via web ui, should use datanode's fqdn instead of ip address

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5661:


Assignee: Benoy Antony  (was: Jing Zhao)
  Status: Patch Available  (was: In Progress)

> Browsing FileSystem via web ui, should use datanode's fqdn instead of ip 
> address
> 
>
> Key: HDFS-5661
> URL: https://issues.apache.org/jira/browse/HDFS-5661
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-5661.patch, HDFS-5661.patch
>
>
> If authentication is enabled on the web ui, then a cookie is used to keep 
> track of the authentication information. There is normally a domain 
> associated with the cookie. Since ip address doesn't have any domain , the 
> cookie will not be sent by the browser while making http calls with ip 
> address as the destination server.
> This will break browsing files system via web ui , if authentication is 
> enabled.
> Browsing FileSystem via web ui, should use datanode's fqdn instead of ip 
> address. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5629) Support HTTPS in JournalNode and SecondaryNameNode

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5629:


Fix Version/s: (was: 3.0.0)
   2.4.0

I've merged this to branch-2.

> Support HTTPS in JournalNode and SecondaryNameNode
> --
>
> Key: HDFS-5629
> URL: https://issues.apache.org/jira/browse/HDFS-5629
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5629.000.patch, HDFS-5629.001.patch, 
> HDFS-5629.002.patch, HDFS-5629.003.patch
>
>
> Currently JournalNode has only HTTP support only. This jira tracks the effort 
> to add HTTPS support into JournalNode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5431) support cachepool-based limit management in path-based caching

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5431:
---

  Resolution: Fixed
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0
  Status: Resolved  (was: Patch Available)

> support cachepool-based limit management in path-based caching
> --
>
> Key: HDFS-5431
> URL: https://issues.apache.org/jira/browse/HDFS-5431
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Fix For: 3.0.0
>
> Attachments: hdfs-5431-1.patch, hdfs-5431-2.patch, hdfs-5431-3.patch, 
> hdfs-5431-4.patch, hdfs-5431-5.patch, hdfs-5431-6.patch, hdfs-5431-7.patch
>
>
> We should support cachepool-based limit management in path-based caching.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (HDFS-5667) StorageType and State in DatanodeStorageInfo in NameNode is not accurate

2013-12-17 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDFS-5667:
---

Assignee: Arpit Agarwal

> StorageType and State in DatanodeStorageInfo in NameNode is not accurate
> 
>
> Key: HDFS-5667
> URL: https://issues.apache.org/jira/browse/HDFS-5667
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Eric Sirianni
>Assignee: Arpit Agarwal
> Fix For: Heterogeneous Storage (HDFS-2832)
>
>
> The fix for HDFS-5484 was accidentally regressed by the following change made 
> via HDFS-5542
> {code}
> +  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
>  synchronized (storageMap) {
>DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
>if (storage == null) {
> @@ -670,8 +658,6 @@
>   " for DN " + getXferAddr());
>  storage = new DatanodeStorageInfo(this, s);
>  storageMap.put(s.getStorageID(), storage);
> -  } else {
> -storage.setState(s.getState());
>}
>return storage;
>  }
> {code}
> By removing the 'else' and no longer updating the state in the BlockReport 
> processing path, we effectively get the bogus state & type that is set via 
> the first heartbeat (see the fix for HDFS-5455):
> {code}
> +  if (storage == null) {
> +// This is seen during cluster initialization when the heartbeat
> +// is received before the initial block reports from each storage.
> +storage = updateStorage(new DatanodeStorage(report.getStorageID()));
> {code}
> Even reverting the change and reintroducing the 'else' leaves the state & 
> type temporarily inaccurate until the first block report. 
> As discussed with [~arpitagarwal], a better fix would be to simply include 
> the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to 
> only the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
> {{StorageReportProto}}. It needs to be a new optional field and we cannot 
> remove the existing {{StorageUuid}} for protocol compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Work started] (HDFS-5667) StorageType and State in DatanodeStorageInfo in NameNode is not accurate

2013-12-17 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5667 started by Arpit Agarwal.

> StorageType and State in DatanodeStorageInfo in NameNode is not accurate
> 
>
> Key: HDFS-5667
> URL: https://issues.apache.org/jira/browse/HDFS-5667
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Eric Sirianni
>Assignee: Arpit Agarwal
> Fix For: Heterogeneous Storage (HDFS-2832)
>
>
> The fix for HDFS-5484 was accidentally regressed by the following change made 
> via HDFS-5542
> {code}
> +  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
>  synchronized (storageMap) {
>DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
>if (storage == null) {
> @@ -670,8 +658,6 @@
>   " for DN " + getXferAddr());
>  storage = new DatanodeStorageInfo(this, s);
>  storageMap.put(s.getStorageID(), storage);
> -  } else {
> -storage.setState(s.getState());
>}
>return storage;
>  }
> {code}
> By removing the 'else' and no longer updating the state in the BlockReport 
> processing path, we effectively get the bogus state & type that is set via 
> the first heartbeat (see the fix for HDFS-5455):
> {code}
> +  if (storage == null) {
> +// This is seen during cluster initialization when the heartbeat
> +// is received before the initial block reports from each storage.
> +storage = updateStorage(new DatanodeStorage(report.getStorageID()));
> {code}
> Even reverting the change and reintroducing the 'else' leaves the state & 
> type temporarily inaccurate until the first block report. 
> As discussed with [~arpitagarwal], a better fix would be to simply include 
> the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to 
> only the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
> {{StorageReportProto}}. It needs to be a new optional field and we cannot 
> remove the existing {{StorageUuid}} for protocol compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-4746) ClassCastException in BlockManager.addStoredBlock() due to that blockReceived came after file was closed.

2013-12-17 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850944#comment-13850944
 ] 

Plamen Jeliazkov commented on HDFS-4746:


Konstantin,

HDFS-5285 addresses your issue by removing the class cast and just returning if 
the block is already completed.
BlockManager.addStoredBlock does not try to class cast BlockCollection anymore.

> ClassCastException in BlockManager.addStoredBlock() due to that blockReceived 
> came after file was closed.
> -
>
> Key: HDFS-4746
> URL: https://issues.apache.org/jira/browse/HDFS-4746
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha
>Reporter: Konstantin Shvachko
>
> In some cases the last block replica of a file can be reported after the file 
> was closed. In this case file inode is of type INodeFile. 
> BlockManager.addStoredBlock() though expects it to be 
> INodeFileUnderConstruction, and therefore class cast to 
> MutableBlockCollection fails.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5651) remove dfs.namenode.caching.enabled

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5651:
---

Attachment: HDFS-5651.004.patch

> remove dfs.namenode.caching.enabled
> ---
>
> Key: HDFS-5651
> URL: https://issues.apache.org/jira/browse/HDFS-5651
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5651.001.patch, HDFS-5651.002.patch, 
> HDFS-5651.003.patch, HDFS-5651.004.patch
>
>
> We can remove dfs.namenode.caching.enabled and simply always enable caching, 
> similar to how we do with snapshots and other features.  The main overhead is 
> the size of the cachedBlocks GSet.  However, we can simply make the size of 
> this GSet configurable, and people who don't want caching can set it to a 
> very small value.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850755#comment-13850755
 ] 

Jing Zhao commented on HDFS-5657:
-

The patch looks good to me. One nit is that startOffset can be declared as 
final. +1 after addressing the comment.
{code}
long startOffset = asyncWriteBackStartOffset;
{code}

> race condition causes writeback state error in NFS gateway
> --
>
> Key: HDFS-5657
> URL: https://issues.apache.org/jira/browse/HDFS-5657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch, 
> HDFS-5657.new.001.patch
>
>
> A race condition between NFS gateway writeback executor thread and new write 
> handler thread can cause writeback state check failure, e.g.,
> {noformat}
> 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
> (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
> 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
> (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
> writes, fileId: 30938
> 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
> (AsyncDataService.java:run(136)) - Asyn data service got 
> error:java.lang.IllegalStateException: The openFileCtx has false async status
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
> (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
> filesize=917504
> 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
> (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
> 917504 length:65536 stableHow:0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-17 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5657:
-

Fix Version/s: 2.3.0

> race condition causes writeback state error in NFS gateway
> --
>
> Key: HDFS-5657
> URL: https://issues.apache.org/jira/browse/HDFS-5657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Fix For: 2.3.0
>
> Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch, 
> HDFS-5657.new.001.patch, HDFS-5657.new.002.patch
>
>
> A race condition between NFS gateway writeback executor thread and new write 
> handler thread can cause writeback state check failure, e.g.,
> {noformat}
> 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
> (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
> 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
> (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
> writes, fileId: 30938
> 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
> (AsyncDataService.java:run(136)) - Asyn data service got 
> error:java.lang.IllegalStateException: The openFileCtx has false async status
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
> (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
> filesize=917504
> 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
> (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
> 917504 length:65536 stableHow:0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip

2013-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850902#comment-13850902
 ] 

Colin Patrick McCabe commented on HDFS-5574:


Let's close this as duplicate since after HDFS-5634, we no longer do a buffer 
copy in {{BlockReader#skip}}.

> Remove buffer copy in BlockReader.skip
> --
>
> Key: HDFS-5574
> URL: https://issues.apache.org/jira/browse/HDFS-5574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Trivial
> Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch
>
>
> BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
> data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850910#comment-13850910
 ] 

Hudson commented on HDFS-5634:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4902 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4902/])
HDFS-5634. Allow BlockReaderLocal to switch between checksumming and not 
(cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551701)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java


> allow BlockReaderLocal to switch between checksumming and not
> -
>
> Key: HDFS-5634
> URL: https://issues.apache.org/jira/browse/HDFS-5634
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
> HDFS-5634.003.patch, HDFS-5634.004.patch, HDFS-5634.005.patch, 
> HDFS-5634.006.patch, HDFS-5634.007.patch, HDFS-5634.008.patch
>
>
> BlockReaderLocal should be able to switch between checksumming and 
> non-checksumming, so that when we get notifications that something is mlocked 
> (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-17 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5657:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> race condition causes writeback state error in NFS gateway
> --
>
> Key: HDFS-5657
> URL: https://issues.apache.org/jira/browse/HDFS-5657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch, 
> HDFS-5657.new.001.patch, HDFS-5657.new.002.patch
>
>
> A race condition between NFS gateway writeback executor thread and new write 
> handler thread can cause writeback state check failure, e.g.,
> {noformat}
> 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
> (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
> 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
> (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
> writes, fileId: 30938
> 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
> (AsyncDataService.java:run(136)) - Asyn data service got 
> error:java.lang.IllegalStateException: The openFileCtx has false async status
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
> (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
> filesize=917504
> 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
> (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
> 917504 length:65536 stableHow:0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5242) Reduce contention on DatanodeInfo instances

2013-12-17 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850805#comment-13850805
 ] 

Suresh Srinivas commented on HDFS-5242:
---

Alternatively, this can be done in a lockless fashion using AtomicReference. 
Let me know if you want me to upload a patch for that.

> Reduce contention on DatanodeInfo instances
> ---
>
> Key: HDFS-5242
> URL: https://issues.apache.org/jira/browse/HDFS-5242
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-5242.patch
>
>
> Synchronization in {{DatanodeInfo}} instances causes unnecessary contention 
> between call handlers.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip

2013-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850907#comment-13850907
 ] 

Colin Patrick McCabe commented on HDFS-5574:


actually, maybe I spoke too soon.  This patch removes a buffer copy from 
{{RemoteBlockReader2#skip}}, which HDFS-5634 doesn't change.  HDFS-5634 only 
affects {{BlockReaderLocal}}, not either of the remote block readers.  So 
Binglin, if you want to prepare a new version, it would be worth looking at.

> Remove buffer copy in BlockReader.skip
> --
>
> Key: HDFS-5574
> URL: https://issues.apache.org/jira/browse/HDFS-5574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Trivial
> Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch
>
>
> BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
> data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5564) Refactor tests in TestCacheDirectives

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-5564.


Resolution: Duplicate

> Refactor tests in TestCacheDirectives
> -
>
> Key: HDFS-5564
> URL: https://issues.apache.org/jira/browse/HDFS-5564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Trivial
>
> Some of the tests in TestCacheDirectives start their own MiniDFSCluster to 
> get a new config, even though we already start a cluster in the @Before 
> function. This contributes to longer test runs and code duplication.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5564) Refactor tests in TestCacheDirectives

2013-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850911#comment-13850911
 ] 

Colin Patrick McCabe commented on HDFS-5564:


we did this refactor as part of HDFS-5431

> Refactor tests in TestCacheDirectives
> -
>
> Key: HDFS-5564
> URL: https://issues.apache.org/jira/browse/HDFS-5564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Trivial
>
> Some of the tests in TestCacheDirectives start their own MiniDFSCluster to 
> get a new config, even though we already start a cluster in the @Before 
> function. This contributes to longer test runs and code duplication.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5312) Generate HTTP / HTTPS URL in DFSUtil#getInfoServer() based on the configured http policy

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5312:


Fix Version/s: (was: 3.0.0)
   2.4.0

I've merged this to branch-2.

> Generate HTTP / HTTPS URL in DFSUtil#getInfoServer() based on the configured 
> http policy
> 
>
> Key: HDFS-5312
> URL: https://issues.apache.org/jira/browse/HDFS-5312
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5312.000.patch, HDFS-5312.001.patch, 
> HDFS-5312.002.patch, HDFS-5312.003.patch, HDFS-5312.004.patch, 
> HDFS-5312.005.patch, HDFS-5312.006.patch, HDFS-5312.007.patch, 
> HDFS-5312.008.patch
>
>
> DFSUtil#getInfoServer() returns only the authority (i.e., host+port) when 
> searching for the http / https server. This is insufficient because HDFS-5536 
> and related jiras allows NN / DN / JN to open HTTPS only using the HTTPS_ONLY 
> policy.
> This JIRA addresses two issues. First, DFSUtil#getInfoServer() should return 
> an URI instead of a string, so that the scheme is an inherent parts of the 
> return value, which eliminates the task of figuring out the scheme by design. 
> Second, it introduces a new function to choose whether http or https should 
> be used to connect to the remote server based of dfs.http.policy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5667) StorageType and State in DatanodeStorageInfo in NameNode is not accurate

2013-12-17 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850917#comment-13850917
 ] 

Eric Sirianni commented on HDFS-5667:
-

Haven't had a chance yet - please go ahead and pick it up.  Thanks!

> StorageType and State in DatanodeStorageInfo in NameNode is not accurate
> 
>
> Key: HDFS-5667
> URL: https://issues.apache.org/jira/browse/HDFS-5667
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Eric Sirianni
> Fix For: Heterogeneous Storage (HDFS-2832)
>
>
> The fix for HDFS-5484 was accidentally regressed by the following change made 
> via HDFS-5542
> {code}
> +  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
>  synchronized (storageMap) {
>DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
>if (storage == null) {
> @@ -670,8 +658,6 @@
>   " for DN " + getXferAddr());
>  storage = new DatanodeStorageInfo(this, s);
>  storageMap.put(s.getStorageID(), storage);
> -  } else {
> -storage.setState(s.getState());
>}
>return storage;
>  }
> {code}
> By removing the 'else' and no longer updating the state in the BlockReport 
> processing path, we effectively get the bogus state & type that is set via 
> the first heartbeat (see the fix for HDFS-5455):
> {code}
> +  if (storage == null) {
> +// This is seen during cluster initialization when the heartbeat
> +// is received before the initial block reports from each storage.
> +storage = updateStorage(new DatanodeStorage(report.getStorageID()));
> {code}
> Even reverting the change and reintroducing the 'else' leaves the state & 
> type temporarily inaccurate until the first block report. 
> As discussed with [~arpitagarwal], a better fix would be to simply include 
> the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to 
> only the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
> {{StorageReportProto}}. It needs to be a new optional field and we cannot 
> remove the existing {{StorageUuid}} for protocol compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5661) Browsing FileSystem via web ui, should use datanode's fqdn instead of ip address

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850923#comment-13850923
 ] 

Hadoop QA commented on HDFS-5661:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619135/HDFS-5661.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5744//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5744//console

This message is automatically generated.

> Browsing FileSystem via web ui, should use datanode's fqdn instead of ip 
> address
> 
>
> Key: HDFS-5661
> URL: https://issues.apache.org/jira/browse/HDFS-5661
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-5661.patch, HDFS-5661.patch
>
>
> If authentication is enabled on the web ui, then a cookie is used to keep 
> track of the authentication information. There is normally a domain 
> associated with the cookie. Since ip address doesn't have any domain , the 
> cookie will not be sent by the browser while making http calls with ip 
> address as the destination server.
> This will break browsing files system via web ui , if authentication is 
> enabled.
> Browsing FileSystem via web ui, should use datanode's fqdn instead of ip 
> address. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5634:
---

Attachment: HDFS-5634.008.patch

> allow BlockReaderLocal to switch between checksumming and not
> -
>
> Key: HDFS-5634
> URL: https://issues.apache.org/jira/browse/HDFS-5634
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
> HDFS-5634.003.patch, HDFS-5634.004.patch, HDFS-5634.005.patch, 
> HDFS-5634.006.patch, HDFS-5634.007.patch, HDFS-5634.008.patch
>
>
> BlockReaderLocal should be able to switch between checksumming and 
> non-checksumming, so that when we get notifications that something is mlocked 
> (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5671) When Hbase RegionServer request block to DataNode and "java.io.IOException" occurs, the fail TCP socket is not closed (in status "CLOSE_WAIT" with port 1004 of DataNode)

2013-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850753#comment-13850753
 ] 

Colin Patrick McCabe commented on HDFS-5671:


It seems like we should move this section:
{code}
// Will be getting a new BlockReader.
if (blockReader != null) {
  blockReader.close();
  blockReader = null;
}
{code}

to be before the section which assigns a new value to {{blockReader}}, rather 
than special-casing one kind of error.  Also, it might make sense to use 
{{IOUtils#cleanup}} here, although I'm not aware of any {{BlockReader}} objects 
that throw an exception from {{close}}.

> When Hbase RegionServer request block to DataNode and "java.io.IOException" 
> occurs, the fail TCP socket is not closed (in status "CLOSE_WAIT" with port 
> 1004 of DataNode)
> -
>
> Key: HDFS-5671
> URL: https://issues.apache.org/jira/browse/HDFS-5671
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.2.0
> Environment: hadoop-2.2.0
> java version "1.6.0_31"
> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
> Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 23:51:20 UTC 2013 x86_64 
> x86_64 x86_64 GNU/Linux
>Reporter: JamesLi
>Priority: Critical
> Attachments: 5671.patch
>
>
> lsof -i TCP:1004 | grep -c CLOSE_WAIT
> 18235
> When hbase regionserver request a file's block to DataNode:1004. If request 
> fail because "java.io.IOException: Got error for OP_READ_BLOCK,Block token is 
> expired." Occurs  and the TCP socket that regionserver using is not closed.
> I think the problem above is in DatanodeInfo blockSeekTo(long target)  of 
> Class DFSInputStream 
> The connection regionserver using is BlockReader: 
> blockReader = getBlockReader(targetAddr, chosenNode, src, blk,
> accessToken, offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
> buffersize, verifyChecksum, dfsClient.clientName);
> and if this connection fail, regionserver will fetch a new access token , and 
> old Connection is not closed here. 
> I think need small code to close old Connection when exception happens:
>   if(blockReader != null)
>   try{
>   blockReader.close();
>   blockReader = null;
>   } catch (IOException exc) {
>   DFSClient.LOG.error("Close connection to " + targetAddr 
> + " failed");
>   } 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-5675:
---

Attachment: mkdirsBenchmarkPatchTrunk.patch

Attaching patch for trunk. Operation is done in the same style as the other 
operations; most similar to the CreateOp.

> Add Mkdirs operation to NNThroughputBenchmark
> -
>
> Key: HDFS-5675
> URL: https://issues.apache.org/jira/browse/HDFS-5675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: mkdirsBenchmarkPatchTrunk.patch
>
>
> I did some work to extend NNThroughputBenchmark that I would like to 
> contribute to the community. It is pretty straightforward; just adding a 
> Mkdir operation to the test in order to see the operations per second of a 
> multiple 'mkdir' commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Work started] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Plamen Jeliazkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5675 started by Plamen Jeliazkov.

> Add Mkdirs operation to NNThroughputBenchmark
> -
>
> Key: HDFS-5675
> URL: https://issues.apache.org/jira/browse/HDFS-5675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: mkdirsBenchmarkPatchTrunk.patch
>
>
> I did some work to extend NNThroughputBenchmark that I would like to 
> contribute to the community. It is pretty straightforward; just adding a 
> Mkdir operation to the test in order to see the operations per second of a 
> multiple 'mkdir' commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5653:
-

Status: Patch Available  (was: Open)

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Priority: Minor
> Attachments: HDFS-5653.000.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-4710) SCR should honor dfs.client.read.shortcircuit.buffer.size even when checksums are off

2013-12-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4710:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

This was resolved in HDFS-5634 by having BlockReaderLocal honor the 
{{dfs.client.cache.readahead}} setting.  If it is set to a non-zero value, we 
will buffer rather than reading directly into the user-supplied buffer-- even 
when checksums are off.

> SCR should honor dfs.client.read.shortcircuit.buffer.size even when checksums 
> are off
> -
>
> Key: HDFS-4710
> URL: https://issues.apache.org/jira/browse/HDFS-4710
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.4-alpha
> Environment: Centos (EC2) + short-circuit reads on
>Reporter: Gopal V
>Assignee: Colin Patrick McCabe
>Priority: Minor
>  Labels: perfomance
> Attachments: HDFS-4710.001.patch, HDFS-4710.002.patch
>
>
> When short-circuit reads are on, HDFS client slows down when checksums are 
> turned off.
> With checksums on, the query takes 45.341 seconds and with it turned off, it 
> takes 56.345 seconds. This is slower than the speeds observed when 
> short-circuiting is turned off.
> The issue seems to be that FSDataInputStream.readByte() calls are directly 
> transferred to the disk fd when the checksums are turned off.
> Even though all the columns are integers, the data being read will be read 
> via DataInputStream which does
> {code}
> public final int readInt() throws IOException {
> int ch1 = in.read();
> int ch2 = in.read();
> int ch3 = in.read();
> int ch4 = in.read();
> {code}
> To confirm, an strace of the Yarn container shows
> {code}
> 26690 read(154, "B", 1) = 1
> 26690 read(154, "\250", 1)  = 1
> 26690 read(154, ".", 1) = 1
> 26690 read(154, "\24", 1)   = 1
> {code}
> To emulate this without the entirety of Hive code, I have written a simpler 
> test app 
> https://github.com/t3rmin4t0r/shortcircuit-reader
> The jar will read a file in -bs  sized buffers. Running it with 1 byte 
> blocks gives similar results to the Hive test run.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850899#comment-13850899
 ] 

Hudson commented on HDFS-5657:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4901 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4901/])
HDFS-5657. race condition causes writeback state error in NFS gateway. 
Contributed by Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1551691)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> race condition causes writeback state error in NFS gateway
> --
>
> Key: HDFS-5657
> URL: https://issues.apache.org/jira/browse/HDFS-5657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch, 
> HDFS-5657.new.001.patch, HDFS-5657.new.002.patch
>
>
> A race condition between NFS gateway writeback executor thread and new write 
> handler thread can cause writeback state check failure, e.g.,
> {noformat}
> 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
> (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
> 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
> (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
> writes, fileId: 30938
> 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
> (AsyncDataService.java:run(136)) - Asyn data service got 
> error:java.lang.IllegalStateException: The openFileCtx has false async status
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
> (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
> filesize=917504
> 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
> (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
> 917504 length:65536 stableHow:0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup

2013-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5653:
-

Attachment: HDFS-5653.000.patch

> Log namenode hostname in various exceptions being thrown in a HA setup
> --
>
> Key: HDFS-5653
> URL: https://issues.apache.org/jira/browse/HDFS-5653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>Priority: Minor
> Attachments: HDFS-5653.000.patch
>
>
> In a HA setup any time we see an exception such as safemode or namenode in 
> standby etc we dont know which namenode it came from. The user has to go to 
> the logs of the namenode and determine which one was active and/or standby 
> around the same time.
> I think it would help with debugging if any such exceptions could include the 
> namenode hostname so the user could know exactly which namenode served the 
> request.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5667) StorageType and State in DatanodeStorageInfo in NameNode is not accurate

2013-12-17 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850914#comment-13850914
 ] 

Arpit Agarwal commented on HDFS-5667:
-

Eric, are you looking at this? I'll pick it up otherwise.

> StorageType and State in DatanodeStorageInfo in NameNode is not accurate
> 
>
> Key: HDFS-5667
> URL: https://issues.apache.org/jira/browse/HDFS-5667
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: Heterogeneous Storage (HDFS-2832)
>Reporter: Eric Sirianni
> Fix For: Heterogeneous Storage (HDFS-2832)
>
>
> The fix for HDFS-5484 was accidentally regressed by the following change made 
> via HDFS-5542
> {code}
> +  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
>  synchronized (storageMap) {
>DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
>if (storage == null) {
> @@ -670,8 +658,6 @@
>   " for DN " + getXferAddr());
>  storage = new DatanodeStorageInfo(this, s);
>  storageMap.put(s.getStorageID(), storage);
> -  } else {
> -storage.setState(s.getState());
>}
>return storage;
>  }
> {code}
> By removing the 'else' and no longer updating the state in the BlockReport 
> processing path, we effectively get the bogus state & type that is set via 
> the first heartbeat (see the fix for HDFS-5455):
> {code}
> +  if (storage == null) {
> +// This is seen during cluster initialization when the heartbeat
> +// is received before the initial block reports from each storage.
> +storage = updateStorage(new DatanodeStorage(report.getStorageID()));
> {code}
> Even reverting the change and reintroducing the 'else' leaves the state & 
> type temporarily inaccurate until the first block report. 
> As discussed with [~arpitagarwal], a better fix would be to simply include 
> the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to 
> only the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
> {{StorageReportProto}}. It needs to be a new optional field and we cannot 
> remove the existing {{StorageUuid}} for protocol compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5545) Allow specifying endpoints for listeners in HttpServer

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5545:


Fix Version/s: (was: 3.0.0)
   2.4.0

I've merged this to branch-2.

> Allow specifying endpoints for listeners in HttpServer
> --
>
> Key: HDFS-5545
> URL: https://issues.apache.org/jira/browse/HDFS-5545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5545.000.patch, HDFS-5545.001.patch, 
> HDFS-5545.002.patch, HDFS-5545.003.patch
>
>
> Currently HttpServer listens to HTTP port and provides a method to allow the 
> users to add an SSL listeners after the server starts. This complicates the 
> logic if the client needs to set up HTTP / HTTPS serverfs.
> This jira proposes to replace these two methods with the concepts of listener 
> endpoints. A listener endpoints is a URI (i.e., scheme + host + port) that 
> the HttpServer should listen to. This concept simplifies the task of managing 
> the HTTP server from HDFS / YARN.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5674) Editlog code cleanup

2013-12-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850809#comment-13850809
 ] 

Jing Zhao commented on HDFS-5674:
-

+1. Do we also want to add a simple javadoc for OP_INVALID and mention that it 
must be placed in the end?

> Editlog code cleanup
> 
>
> Key: HDFS-5674
> URL: https://issues.apache.org/jira/browse/HDFS-5674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h5674_20131217.patch
>
>
> A few minor improvements:
> - \@SuppressWarnings("deprecation") in FSEditLogOp can be removed.
> - FSEditLogOpCodes.fromByte(..) can be more efficient.
> - Some fields in FSEditLog can be final.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5675) Add Mkdirs operation to NNThroughputBenchmark

2013-12-17 Thread Plamen Jeliazkov (JIRA)
Plamen Jeliazkov created HDFS-5675:
--

 Summary: Add Mkdirs operation to NNThroughputBenchmark
 Key: HDFS-5675
 URL: https://issues.apache.org/jira/browse/HDFS-5675
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: benchmarks
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Minor
 Fix For: 3.0.0


I did some work to extend NNThroughputBenchmark that I would like to contribute 
to the community. It is pretty straightforward; just adding a Mkdir operation 
to the test in order to see the operations per second of a multiple 'mkdir' 
commands.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850926#comment-13850926
 ] 

Hadoop QA commented on HDFS-5634:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619139/HDFS-5634.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
-14 warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5743//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5743//console

This message is automatically generated.

> allow BlockReaderLocal to switch between checksumming and not
> -
>
> Key: HDFS-5634
> URL: https://issues.apache.org/jira/browse/HDFS-5634
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
> HDFS-5634.003.patch, HDFS-5634.004.patch, HDFS-5634.005.patch, 
> HDFS-5634.006.patch, HDFS-5634.007.patch, HDFS-5634.008.patch
>
>
> BlockReaderLocal should be able to switch between checksumming and 
> non-checksumming, so that when we get notifications that something is mlocked 
> (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5536) Implement HTTP policy for Namenode and DataNode

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5536:


Fix Version/s: (was: 3.0.0)
   2.4.0

I've merged this to branch-2.

> Implement HTTP policy for Namenode and DataNode
> ---
>
> Key: HDFS-5536
> URL: https://issues.apache.org/jira/browse/HDFS-5536
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.4.0
>
> Attachments: HDFS-5536.000.patch, HDFS-5536.001.patch, 
> HDFS-5536.002.patch, HDFS-5536.003.patch, HDFS-5536.004.patch, 
> HDFS-5536.005.patch, HDFS-5536.006.patch, HDFS-5536.007.patch, 
> HDFS-5536.008.patch, HDFS-5536.009.patch, HDFS-5536.010.patch
>
>
> this jira implements the http and https policy in the namenode and the 
> datanode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850815#comment-13850815
 ] 

Hadoop QA commented on HDFS-5657:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12619147/HDFS-5657.new.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5745//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5745//console

This message is automatically generated.

> race condition causes writeback state error in NFS gateway
> --
>
> Key: HDFS-5657
> URL: https://issues.apache.org/jira/browse/HDFS-5657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch, 
> HDFS-5657.new.001.patch, HDFS-5657.new.002.patch
>
>
> A race condition between NFS gateway writeback executor thread and new write 
> handler thread can cause writeback state check failure, e.g.,
> {noformat}
> 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
> (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
> 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
> (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
> writes, fileId: 30938
> 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
> (AsyncDataService.java:run(136)) - Asyn data service got 
> error:java.lang.IllegalStateException: The openFileCtx has false async status
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
> (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
> filesize=917504
> 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
> (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
> 917504 length:65536 stableHow:0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down

2013-12-17 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li reassigned HDFS-5649:


Assignee: Brandon Li

> Unregister NFS and Mount service when NFS gateway is shutting down
> --
>
> Key: HDFS-5649
> URL: https://issues.apache.org/jira/browse/HDFS-5649
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
>
> The services should be unregistered if the gateway is asked to shutdown 
> gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5496:


Target Version/s: HDFS-5535 (Rolling upgrades)  (was: 2.4.0)

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not

2013-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850743#comment-13850743
 ] 

Colin Patrick McCabe commented on HDFS-5634:


yeah, I suppose it should be needed >= maxReadaheadLength.  Having being able 
to read exactly the readahead length through the fast path should be allowed.

> allow BlockReaderLocal to switch between checksumming and not
> -
>
> Key: HDFS-5634
> URL: https://issues.apache.org/jira/browse/HDFS-5634
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, 
> HDFS-5634.003.patch, HDFS-5634.004.patch, HDFS-5634.005.patch, 
> HDFS-5634.006.patch, HDFS-5634.007.patch, HDFS-5634.008.patch
>
>
> BlockReaderLocal should be able to switch between checksumming and 
> non-checksumming, so that when we get notifications that something is mlocked 
> (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5449) WebHdfs compatibility broken between 2.2 and 1.x / 23.x

2013-12-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850920#comment-13850920
 ] 

Jason Lowe commented on HDFS-5449:
--

Patch looks pretty good to me.  One question about the toDatanodeInfo change: 
should we do anything different if ipAddr == null but colonIdx <= 0?  Looks 
like we'll just NPE in that case.  Granted we _shouldn't_ see that in practice, 
but I noticed that the old 0.23 DatanodeInfo code that parsed the name string 
would assume a default port of 50010 if it was missing.  Wasn't sure if we 
should either default the port if missing in a similar manner or provide a more 
descriptive error than the resulting NPE if this does somehow happen.

> WebHdfs compatibility broken between 2.2 and 1.x / 23.x
> ---
>
> Key: HDFS-5449
> URL: https://issues.apache.org/jira/browse/HDFS-5449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-5449.patch, HDFS-5449.patch
>
>
> Similarly to HDFS-5403, getFileBlockLocations() fail between old (1.x, 
> 0.23.x) and new (2.x), but this is worse since both directions won't work.  
> This is caused by the removal of "name" field from the serialized json format 
> of DatanodeInfo. 
> 2.x namenode should include "name" (ip:port) in the response and 2.x webhdfs 
> client should use "name", if "ipAddr" and "xferPort" don't exist in the 
> response. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850811#comment-13850811
 ] 

Jing Zhao commented on HDFS-5496:
-

I will commit this patch to HDFS-5535 branch late today or early tomorrow if 
there is no objection.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-17 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5657:
-

Attachment: HDFS-5657.new.002.patch

> race condition causes writeback state error in NFS gateway
> --
>
> Key: HDFS-5657
> URL: https://issues.apache.org/jira/browse/HDFS-5657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch, 
> HDFS-5657.new.001.patch, HDFS-5657.new.002.patch
>
>
> A race condition between NFS gateway writeback executor thread and new write 
> handler thread can cause writeback state check failure, e.g.,
> {noformat}
> 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
> (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
> 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
> (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
> writes, fileId: 30938
> 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
> (AsyncDataService.java:run(136)) - Asyn data service got 
> error:java.lang.IllegalStateException: The openFileCtx has false async status
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
> (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
> filesize=917504
> 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
> (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
> 917504 length:65536 stableHow:0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5657) race condition causes writeback state error in NFS gateway

2013-12-17 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850793#comment-13850793
 ] 

Brandon Li commented on HDFS-5657:
--

Uploaded a new patch to address Jing's last comment.


> race condition causes writeback state error in NFS gateway
> --
>
> Key: HDFS-5657
> URL: https://issues.apache.org/jira/browse/HDFS-5657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-5657.001.patch, HDFS-5657.002.patch, 
> HDFS-5657.new.001.patch, HDFS-5657.new.002.patch
>
>
> A race condition between NFS gateway writeback executor thread and new write 
> handler thread can cause writeback state check failure, e.g.,
> {noformat}
> 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 
> (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843
> 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx 
> (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending 
> writes, fileId: 30938
> 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService 
> (AsyncDataService.java:run(136)) - Asyn data service got 
> error:java.lang.IllegalStateException: The openFileCtx has false async status
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890)
> at 
> org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 
> (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current 
> filesize=917504
> 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager 
> (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 
> 917504 length:65536 stableHow:0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5477) Block manager as a service

2013-12-17 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated HDFS-5477:
-

Attachment: Standalone BM.pdf

Re-attach standalone pdf to fix graphics.

> Block manager as a service
> --
>
> Key: HDFS-5477
> URL: https://issues.apache.org/jira/browse/HDFS-5477
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, 
> Standalone BM.pdf
>
>
> The block manager needs to evolve towards having the ability to run as a 
> standalone service to improve NN vertical and horizontal scalability.  The 
> goal is reducing the memory footprint of the NN proper to support larger 
> namespaces, and improve overall performance by decoupling the block manager 
> from the namespace and its lock.  Ideally, a distinct BM will be transparent 
> to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5658) Implement ACL as a INode feature

2013-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5658:
-

Attachment: HDFS-5658.001.patch

Rebased on 12/17.

> Implement ACL as a INode feature
> 
>
> Key: HDFS-5658
> URL: https://issues.apache.org/jira/browse/HDFS-5658
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5658.000.patch, HDFS-5658.001.patch
>
>
> HDFS-5284 introduces features as generic abstractions to extend the 
> functionality of the inodes. The implementation of ACL should leverage the 
> new abstractions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5619) NameNode: record ACL modifications to edit log.

2013-12-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5619:
-

Attachment: HDFS-5619.000.patch

> NameNode: record ACL modifications to edit log.
> ---
>
> Key: HDFS-5619
> URL: https://issues.apache.org/jira/browse/HDFS-5619
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Haohui Mai
> Attachments: HDFS-5619.000.patch
>
>
> Implement a new edit log opcode, {{OP_SET_ACL}}, which fully replaces the ACL 
> of a specific inode.  For ACL operations that perform partial modification of 
> the ACL, the NameNode must merge the modifications with the existing ACL to 
> produce the final resulting ACL and encode it into an {{OP_SET_ACL}}.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   >