[jira] [Commented] (HDFS-8839) Erasure Coding: client occasionally gets less block locations when some datanodes fail

2015-08-03 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651524#comment-14651524
 ] 

Walter Su commented on HDFS-8839:
-

bq. Otherwise, the client writing can't go on.
Yes, it hangs. It's a problem.

bq. the namenode should still allocate 9 locations even it knows one of them is 
invalid. 
It's not the best solution. Please check my last comment at HDFS-8220. We can 
continue discuss there.

> Erasure Coding: client occasionally gets less block locations when some 
> datanodes fail 
> ---
>
> Key: HDFS-8839
> URL: https://issues.apache.org/jira/browse/HDFS-8839
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
>
> 9 datanodes, write two block groups. A datanode dies when writing the first 
> block group. When client retrieves the second block group from namenode, the 
> returned block group only contains 8 locations occasionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing

2015-08-03 Thread Doris Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doris Gu updated HDFS-7601:
---
Attachment: (was: 0001-for-hdfs-7601.patch)

> Operations(e.g. balance) failed due to deficient configuration parsing
> --
>
> Key: HDFS-7601
> URL: https://issues.apache.org/jira/browse/HDFS-7601
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Doris Gu
>Assignee: Doris Gu
>Priority: Minor
>  Labels: BB2015-05-TBR
>
> Some operations, for example,balance,parses configuration(from 
> core-site.xml,hdfs-site.xml) to get NameServiceUris to link to.
> Current method considers those end with or without "/"  as two different 
> uris, then following operation may meet errors.
> bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different 
> uris   which actually the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing

2015-08-03 Thread Doris Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doris Gu updated HDFS-7601:
---
Attachment: 0001-for-hdfs-7601.patch

> Operations(e.g. balance) failed due to deficient configuration parsing
> --
>
> Key: HDFS-7601
> URL: https://issues.apache.org/jira/browse/HDFS-7601
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Doris Gu
>Assignee: Doris Gu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: 0001-for-hdfs-7601.patch
>
>
> Some operations, for example,balance,parses configuration(from 
> core-site.xml,hdfs-site.xml) to get NameServiceUris to link to.
> Current method considers those end with or without "/"  as two different 
> uris, then following operation may meet errors.
> bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different 
> uris   which actually the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-03 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651555#comment-14651555
 ] 

Li Bo commented on HDFS-8838:
-

The number of datanodes is set to 9 in the unit test. Due to the problem of 
HDFS-8220 or HDFS-8839, I think we should use at least 10 datanodes for testing 
one datanode failure.

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8838_20150729.patch, h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651561#comment-14651561
 ] 

Hadoop QA commented on HDFS-7601:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 47s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   1m 44s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748405/0001-for-hdfs-7601.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 90b5104 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11888/console |


This message was automatically generated.

> Operations(e.g. balance) failed due to deficient configuration parsing
> --
>
> Key: HDFS-7601
> URL: https://issues.apache.org/jira/browse/HDFS-7601
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Doris Gu
>Assignee: Doris Gu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: 0001-for-hdfs-7601.patch
>
>
> Some operations, for example,balance,parses configuration(from 
> core-site.xml,hdfs-site.xml) to get NameServiceUris to link to.
> Current method considers those end with or without "/"  as two different 
> uris, then following operation may meet errors.
> bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different 
> uris   which actually the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651595#comment-14651595
 ] 

Hadoop QA commented on HDFS-8784:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 15s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 34s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  4s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m  7s | Tests passed in hadoop-hdfs. 
|
| | | 203m 16s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748391/HDFS-8784-01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 90b5104 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11887/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11887/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11887/console |


This message was automatically generated.

> BlockInfo#numNodes should be numStorages
> 
>
> Key: HDFS-8784
> URL: https://issues.apache.org/jira/browse/HDFS-8784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Jagadesh Kiran N
> Attachments: HDFS-8784-00.patch, HDFS-8784-01.patch
>
>
> The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages

2015-08-03 Thread Jagadesh Kiran N (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651638#comment-14651638
 ] 

Jagadesh Kiran N commented on HDFS-8784:


Pre-Patch failure is not related to the changes done in the patch

> BlockInfo#numNodes should be numStorages
> 
>
> Key: HDFS-8784
> URL: https://issues.apache.org/jira/browse/HDFS-8784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Jagadesh Kiran N
> Attachments: HDFS-8784-00.patch, HDFS-8784-01.patch
>
>
> The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8841) Catch throwable return null

2015-08-03 Thread Jagadesh Kiran N (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651694#comment-14651694
 ] 

Jagadesh Kiran N commented on HDFS-8841:


I failed to see the chance of Error(like ClassNotFoundError) in the following 
code. Could you please point me to the same. Thanks!

try {
final Path tmp = new Path(job.get(TMP_DIR_LABEL), relativedst);
if (destFileSys.delete(tmp, true))
break;
} catch (Throwable ex) {
// ignore, we are just cleaning up
LOG.debug("Ignoring cleanup exception", ex);
}

> Catch throwable return null
> ---
>
> Key: HDFS-8841
> URL: https://issues.apache.org/jira/browse/HDFS-8841
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: songwanging
>Assignee: Jagadesh Kiran N
>Priority: Minor
>
> In method "map" of class: 
> \hadoop-2.7.1-src\hadoop-tools\hadoop-extras\src\main\java\org\apache\hadoop\tools\DistCpV1.java.
> This method has this code:
>  public void map(LongWritable key,
> FilePair value,
> OutputCollector, Text> out,
> Reporter reporter) throws IOException {
>  ...
> } catch (Throwable ex) {
>   // ignore, we are just cleaning up
>   LOG.debug("Ignoring cleanup exception", ex);
> }
>
>   }
> } 
> ...
> }
> Throwable is the parent type of Exception and Error, so catching Throwable 
> means catching both Exceptions as well as Errors. An Exception is something 
> you could recover (like IOException), an Error is something more serious and 
> usually you could'nt recover easily (like ClassNotFoundError) so it doesn't 
> make much sense to catch an Error.
> We should  convert to catch Exception instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8848) Support OAuth2 in libwebhdfs

2015-08-03 Thread Puneeth P (JIRA)
Puneeth P created HDFS-8848:
---

 Summary: Support OAuth2 in libwebhdfs
 Key: HDFS-8848
 URL: https://issues.apache.org/jira/browse/HDFS-8848
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Reporter: Puneeth P
Assignee: Puneeth P


As per Jira [https://issues.apache.org/jira/browse/HDFS-8155] there is a patch 
for WebHDFS java client. It would be good to bring libwebhdfs on par as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8848) Support OAuth2 in libwebhdfs

2015-08-03 Thread Puneeth P (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Puneeth P updated HDFS-8848:

Issue Type: New Feature  (was: Improvement)

> Support OAuth2 in libwebhdfs
> 
>
> Key: HDFS-8848
> URL: https://issues.apache.org/jira/browse/HDFS-8848
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Reporter: Puneeth P
>Assignee: Puneeth P
>
> As per Jira [https://issues.apache.org/jira/browse/HDFS-8155] there is a 
> patch for WebHDFS java client. It would be good to bring libwebhdfs on par as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8220:
---
Attachment: HDFS-8220-HDFS-7285-09.patch

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8220:
---
Attachment: (was: HDFS-8220-HDFS-7285-09.patch)

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch, 
> HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8220:
---
Attachment: HDFS-8220-HDFS-7285-09.patch

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652228#comment-14652228
 ] 

Rakesh R commented on HDFS-8220:


bq. I saw if ( numOfDNs >= NUM_DATA_BLOCKS && numOfDNs < GROUP_SIZE ), the 
OutputStream hangs and stop writing, even if the file is smaller than a 
cellSize. We should fix that.
Good catch!, I've added testcase to simulate the same. Attached patch where I'm 
closing the streamer which doesn't have blocklocations available. 

After the execution of {{StripedDataStreamer.super.locateFollowingBlock()}}, it 
will validate the data blocks length. Secondly, it does the check for {{(blocks 
== null)}}. I could see {{LocatedBlock}} will be null when there is no 
sufficient data node available for that index. Since we are checking for 
sufficient data blocks number of DNs, those {{LocatedBlock}} will never be 
empty. If there are no block locations available for parity blocks then those 
blocks will become null. I've tried an approach by closing the respective 
parity streamers, any thoughts?

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8823) Move replication factor into individual blocks

2015-08-03 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8823:
-
Attachment: HDFS-8823.001.patch

> Move replication factor into individual blocks
> --
>
> Key: HDFS-8823
> URL: https://issues.apache.org/jira/browse/HDFS-8823
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-8823.000.patch, HDFS-8823.001.patch
>
>
> This jira proposes to record the replication factor in the {{BlockInfo}} 
> class. The changes have two advantages:
> * Decoupling the namespace and the block management layer. It is a 
> prerequisite step to move block management off the heap or to a separate 
> process.
> * Increased flexibility on replicating blocks. Currently the replication 
> factors of all blocks have to be the same. The replication factors of these 
> blocks are equal to the highest replication factor across all snapshots. The 
> changes will allow blocks in a file to have different replication factor, 
> potentially saving some space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8499) Refactor BlockInfo class hierarchy with static helper class

2015-08-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652325#comment-14652325
 ] 

Zhe Zhang commented on HDFS-8499:
-

[~szetszwo] I wonder if you've had a chance to work on reverting or reworking 
this change? Thanks.

> Refactor BlockInfo class hierarchy with static helper class
> ---
>
> Key: HDFS-8499
> URL: https://issues.apache.org/jira/browse/HDFS-8499
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-8499.00.patch, HDFS-8499.01.patch, 
> HDFS-8499.02.patch, HDFS-8499.03.patch, HDFS-8499.04.patch, 
> HDFS-8499.05.patch, HDFS-8499.06.patch, HDFS-8499.07.patch, 
> HDFS-8499.UCFeature.patch, HDFS-bistriped.patch
>
>
> In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a 
> common abstraction for striped and contiguous UC blocks. This JIRA aims to 
> merge it to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8849:
---

 Summary: fsck should report number of missing blocks with 
replication factor 1
 Key: HDFS-8849
 URL: https://issues.apache.org/jira/browse/HDFS-8849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor


HDFS-7165 supports reporting number of blocks with replication factor 1 in 
{{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8046) Allow better control of getContentSummary

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-8046:
--
Labels: 2.6.1-candidate  (was: )

> Allow better control of getContentSummary
> -
>
> Key: HDFS-8046
> URL: https://issues.apache.org/jira/browse/HDFS-8046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>  Labels: 2.6.1-candidate
> Fix For: 2.8.0
>
> Attachments: HDFS-8046.v1.patch
>
>
> On busy clusters, users performing quota checks against a big directory 
> structure can affect the namenode performance. It has become a lot better 
> after HDFS-4995, but as clusters get bigger and busier, it is apparent that 
> we need finer grain control to avoid long read lock causing throughput drop.
> Even with unfair namesystem lock setting, a long read lock (10s of 
> milliseconds) can starve many readers and especially writers. So the locking 
> duration should be reduced, which can be done by imposing a lower 
> count-per-iteration limit in the existing implementation.  But HDFS-4995 came 
> with a fixed amount of sleep between locks. This needs to be made 
> configurable, so that {{getContentSummary()}} doesn't get exceedingly slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8046) Allow better control of getContentSummary

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-8046:
--
Labels: 2.6.1-candidate 2.7.2-candidate  (was: 2.6.1-candidate)

> Allow better control of getContentSummary
> -
>
> Key: HDFS-8046
> URL: https://issues.apache.org/jira/browse/HDFS-8046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.8.0
>
> Attachments: HDFS-8046.v1.patch
>
>
> On busy clusters, users performing quota checks against a big directory 
> structure can affect the namenode performance. It has become a lot better 
> after HDFS-4995, but as clusters get bigger and busier, it is apparent that 
> we need finer grain control to avoid long read lock causing throughput drop.
> Even with unfair namesystem lock setting, a long read lock (10s of 
> milliseconds) can starve many readers and especially writers. So the locking 
> duration should be reduced, which can be done by imposing a lower 
> count-per-iteration limit in the existing implementation.  But HDFS-4995 came 
> with a fixed amount of sleep between locks. This needs to be made 
> configurable, so that {{getContentSummary()}} doesn't get exceedingly slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7446:
--
Labels: 2.6.1-candidate  (was: )

> HDFS inotify should have the ability to determine what txid it has read up to
> -
>
> Key: HDFS-7446
> URL: https://issues.apache.org/jira/browse/HDFS-7446
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: HDFS-7446.001.patch, HDFS-7446.002.patch, 
> HDFS-7446.003.patch
>
>
> HDFS inotify should have the ability to determine what txid it has read up 
> to.  This will allow users who want to avoid missing any events to record 
> this txid and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7894) Rolling upgrade readiness is not updated in jmx until query command is issued.

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7894:
--
Labels: 2.6.1-candidate  (was: )

> Rolling upgrade readiness is not updated in jmx until query command is issued.
> --
>
> Key: HDFS-7894
> URL: https://issues.apache.org/jira/browse/HDFS-7894
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Brahma Reddy Battula
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: HDFS-7894-002.patch, HDFS-7894-003.patch, HDFS-7894.patch
>
>
> When a hdfs rolling upgrade is started and a rollback image is 
> created/uploaded, the active NN does not update its {{rollingUpgradeInfo}} 
> until it receives a query command via RPC. This results in inconsistent info 
> being showing up in the web UI and its jmx page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7929) inotify unable fetch pre-upgrade edit log segments once upgrade starts

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7929:
--
Labels: 2.6.1-candidate  (was: )

> inotify unable fetch pre-upgrade edit log segments once upgrade starts
> --
>
> Key: HDFS-7929
> URL: https://issues.apache.org/jira/browse/HDFS-7929
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: HDFS-7929-000.patch, HDFS-7929-001.patch, 
> HDFS-7929-002.patch, HDFS-7929-003.patch
>
>
> inotify is often used to periodically poll HDFS events. However, once an HDFS 
> upgrade has started, edit logs are moved to /previous on the NN, which is not 
> accessible. Moreover, once the upgrade is finalized /previous is currently 
> lost forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7916:
--
Labels: 2.6.1-candidate  (was: )

> 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
> infinite loop
> --
>
> Key: HDFS-7916
> URL: https://issues.apache.org/jira/browse/HDFS-7916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Vinayakumar B
>Assignee: Rushabh S Shah
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: HDFS-7916-01.patch, HDFS-7916-1.patch
>
>
> if any badblock found, then BPSA for StandbyNode will go for infinite times 
> to report it.
> {noformat}2015-03-11 19:43:41,528 WARN 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
> BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
> stobdtserver3/10.224.54.70:18010
> org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
> to report bad block 
> BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
> at 
> org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-8480:
--
Labels: 2.6.1-candidate  (was: )

> Fix performance and timeout issues in HDFS-7929 by using hard-links to 
> preserve old edit logs instead of copying them
> -
>
> Key: HDFS-8480
> URL: https://issues.apache.org/jira/browse/HDFS-8480
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
> HDFS-8480.02.patch, HDFS-8480.03.patch
>
>
> HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
> {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
> hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-03 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652403#comment-14652403
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8838:
---

[~walter.k.su], thanks for showing a detailed failure case.  It is a multiple 
failure case.  I need to think about how to handle it.  Will work on it in 
HDFS-8383.  Or are you interested in working on HDFS-8383?

[~libo-intel], thanks for the suggestion.  A datanode is started in each test.  
So we already has 10 datanodes.

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8838_20150729.patch, h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7182) JMX metrics aren't accessible when NN is busy

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7182:
--
Labels: 2.6.1-candidate  (was: )

> JMX metrics aren't accessible when NN is busy
> -
>
> Key: HDFS-7182
> URL: https://issues.apache.org/jira/browse/HDFS-7182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: HDFS-7182-2.patch, HDFS-7182-3.patch, HDFS-7182.patch
>
>
> HDFS-5693 has addressed all NN JMX metrics in hadoop 2.0.5. Since then couple 
> new metrics have been added. It turns out "RollingUpgradeStatus" requires 
> FSNamesystem read lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7314) When the DFSClient lease cannot be renewed, abort open-for-write files rather than the entire DFSClient

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7314:
--
Labels: 2.6.1-candidate 2.7.2-candidate BB2015-05-TBR  (was: BB2015-05-TBR)

> When the DFSClient lease cannot be renewed, abort open-for-write files rather 
> than the entire DFSClient
> ---
>
> Key: HDFS-7314
> URL: https://issues.apache.org/jira/browse/HDFS-7314
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
>  Labels: 2.6.1-candidate, 2.7.2-candidate, BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HDFS-7314-2.patch, HDFS-7314-3.patch, HDFS-7314-4.patch, 
> HDFS-7314-5.patch, HDFS-7314-6.patch, HDFS-7314-7.patch, HDFS-7314-8.patch, 
> HDFS-7314-9.patch, HDFS-7314.patch
>
>
> It happened in YARN nodemanger scenario. But it could happen to any long 
> running service that use cached instance of DistrbutedFileSystem.
> 1. Active NN is under heavy load. So it became unavailable for 10 minutes; 
> any DFSClient request will get ConnectTimeoutException.
> 2. YARN nodemanager use DFSClient for certain write operation such as log 
> aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's 
> renewLease RPC got ConnectTimeoutException.
> {noformat}
> 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
> renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds.  
> Aborting ...
> {noformat}
> 3. After DFSClient is in Aborted state, YARN NM can't use that cached 
> instance of DistributedFileSystem.
> {noformat}
> 2014-10-29 20:26:23,991 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc...
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We can make YARN or DFSClient more tolerant to temporary NN unavailability. 
> Given the callstack is YARN -> DistributedFileSystem -> DFSClient, this can 
> be addressed at different layers.
> * YARN closes the DistributedFileSystem object when it receives some well 
> defined exception. Then the next HDFS call will create a new instance of 
> DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS 
> applications need to address this as well.
> * DistributedFileSystem detects Aborted DFSClient and create a new instance 
> of DFSClient. We will need to fix all the places DistributedFileSystem calls 
> DFSClient.
> * After DFSClient gets into Aborted state, it doesn't have to reject all 
> requests , instead it can retry. If NN is available again it can transition 
> to healthy state.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7916:
--
Labels:   (was: 2.6.1-candidate)

> 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
> infinite loop
> --
>
> Key: HDFS-7916
> URL: https://issues.apache.org/jira/browse/HDFS-7916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Vinayakumar B
>Assignee: Rushabh S Shah
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: HDFS-7916-01.patch, HDFS-7916-1.patch
>
>
> if any badblock found, then BPSA for StandbyNode will go for infinite times 
> to report it.
> {noformat}2015-03-11 19:43:41,528 WARN 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
> BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
> stobdtserver3/10.224.54.70:18010
> org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
> to report bad block 
> BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
> at 
> org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652421#comment-14652421
 ] 

Allen Wittenauer commented on HDFS-8849:


That's pretty much covered already.  fsck will already report the number of 
blocks that don't have the minimum replication (whether that be 1 or some 
higher number).

> fsck should report number of missing blocks with replication factor 1
> -
>
> Key: HDFS-8849
> URL: https://issues.apache.org/jira/browse/HDFS-8849
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> HDFS-7165 supports reporting number of blocks with replication factor 1 in 
> {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
> support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652427#comment-14652427
 ] 

Zhe Zhang commented on HDFS-8849:
-

Thanks for the input Allen. I guess there's still a small gap. Even when we 
know 1) the number of missing blocks; 2) number of blocks below min 
replication, it's not always possible to calculate the number of blocks meeting 
both conditions. So agreed that it's partially covered. This JIRA will just 
fill in the small gap.

> fsck should report number of missing blocks with replication factor 1
> -
>
> Key: HDFS-8849
> URL: https://issues.apache.org/jira/browse/HDFS-8849
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> HDFS-7165 supports reporting number of blocks with replication factor 1 in 
> {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
> support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652430#comment-14652430
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8220:
---

Some minor comment:
{code}
   if (!coordinator.getStripedDataStreamer(i).isFailed()) {
+StripedDataStreamer curStreamer = coordinator
+.getStripedDataStreamer(i);
{code}
Let's call getStripedDataStreamer before the if.  How about renaming 
curStreamer to si?  CurrentStreamer has a different meaning in 
DFSStripedOutputStream.

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652439#comment-14652439
 ] 

Allen Wittenauer commented on HDFS-8849:


I'm not sure what benefit that number provides.  If I'm missing a block below 
min rep, I'm still going through the full fsck output to try and find it.

> fsck should report number of missing blocks with replication factor 1
> -
>
> Key: HDFS-8849
> URL: https://issues.apache.org/jira/browse/HDFS-8849
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> HDFS-7165 supports reporting number of blocks with replication factor 1 in 
> {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
> support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652451#comment-14652451
 ] 

Zhe Zhang commented on HDFS-8849:
-

I guess the motivation is the same as HDFS-7165. A replication factor of 1 
indicates the data is "disposable". So when checking {{fsck}} on a directory 
the user might want to separately consider this metric (e.g., less alarmed 
about the number of disposable data that's missing).

> fsck should report number of missing blocks with replication factor 1
> -
>
> Key: HDFS-8849
> URL: https://issues.apache.org/jira/browse/HDFS-8849
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> HDFS-7165 supports reporting number of blocks with replication factor 1 in 
> {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
> support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652471#comment-14652471
 ] 

Allen Wittenauer commented on HDFS-8849:


bq.A replication factor of 1 indicates the data is "disposable". So when 
checking fsck on a directory the user might want to separately consider this 
metric (e.g., less alarmed about the number of disposable data that's missing).

Meanwhile, back in real life, users set a repl factor of 1 to avoid quotas 
problems. I've seen it over and over and over. It's why a lot of us are 
starting to use min repl of 2.  Special casing 1 is a dangerous capitulation to 
a bad practice that should be outlawed on production systems.

> fsck should report number of missing blocks with replication factor 1
> -
>
> Key: HDFS-8849
> URL: https://issues.apache.org/jira/browse/HDFS-8849
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> HDFS-7165 supports reporting number of blocks with replication factor 1 in 
> {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
> support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation

2015-08-03 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8804:

Attachment: HDFS-8804.001.patch

Thanks Nicholas for the review! Update the patch to address the comments. I did 
not add synchronized to {{getParityBuffer}} because it is only used in 
StatefulStripeReader which is already protected by the lock.

> Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer 
> allocation
> ---
>
> Key: HDFS-8804
> URL: https://issues.apache.org/jira/browse/HDFS-8804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch
>
>
> Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for 
> the stripe buffer and the buffers holding parity data. It's better to get 
> ByteBuffer from DirectBufferPool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8846) Create edit log files with old layout version for upgrade testing

2015-08-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652515#comment-14652515
 ] 

Zhe Zhang commented on HDFS-8846:
-

Thanks Ming for the feedback!

I was planning to only add edit log files. But I think creating an entire NN 
dir structure with old layout version is a good idea. It could support a 
broader range of upgrade tests.

> Create edit log files with old layout version for upgrade testing
> -
>
> Key: HDFS-8846
> URL: https://issues.apache.org/jira/browse/HDFS-8846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> Per discussion under HDFS-8480, we should create some edit log files with old 
> layout version, to test whether they can be correctly handled in upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-08-03 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated HDFS-8828:
---
Attachment: HDFS-8828.003.patch

> Utilize Snapshot diff report to build copy list in distcp
> -
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-08-03 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652525#comment-14652525
 ] 

Yufei Gu commented on HDFS-8828:


Hi Yongjun,

Thank you very much for detailed code review and all nice suggestion. I've 
upload a new patch(HDFS-8828.003.path) for above comments.

> Utilize Snapshot diff report to build copy list in distcp
> -
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8747) Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption Zones

2015-08-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652534#comment-14652534
 ] 

Andrew Wang commented on HDFS-8747:
---

>From our side, we have some customers using encryption who want Trash as a 
>safety mechanism. So simply using -skipTrash means they lose this safety. My 
>advice has been to use snapshots, since snapshots provide similar (if not 
>superior) properties to trash. That's also why I'm willing to accept some of 
>the compromises regarding the proposed design; while not perfect, it's better 
>than what we've got now.

I do think though that nested encryption zones would make this better yet (for 
reasons even besides trash), and would not be too difficult to implement.

> Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption 
> Zones
> --
>
> Key: HDFS-8747
> URL: https://issues.apache.org/jira/browse/HDFS-8747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, 
> HDFS-8747-07292015.pdf
>
>
> HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to 
> allow create encryption zone on top of a single HDFS directory. Files under 
> the root directory of the encryption zone will be encrypted/decrypted 
> transparently upon HDFS client write or read operations. 
> Generally, it does not support rename(without data copying) across encryption 
> zones or between encryption zone and non-encryption zone because different 
> security settings of encryption zones. However, there are certain use cases 
> where efficient rename support is desired. This JIRA is to propose better 
> support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft 
> Delete” (a.k.a. trash) with HDFS encryption zones.
> “Scratch Space” is widely used in Hadoop jobs, which requires efficient 
> rename support. Temporary files from MR jobs are usually stored in staging 
> area outside encryption zone such as “/tmp” directory and then rename to 
> targeted directories as specified once the data is ready to be further 
> processed. 
> Below is a summary of supported/unsupported cases from latest Hadoop:
> * Rename within the encryption zone is supported
> * Rename the entire encryption zone by moving the root directory of the zone  
> is allowed.
> * Rename sub-directory/file from encryption zone to non-encryption zone is 
> not allowed.
> * Rename sub-directory/file from encryption zone A to encryption zone B is 
> not allowed.
> * Rename from non-encryption zone to encryption zone is not allowed.
> “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that 
> helps prevent accidental deletion of files and directories. If trash is 
> enabled and a file or directory is deleted using the Hadoop shell, the file 
> is moved to the .Trash directory of the user's home directory instead of 
> being deleted.  Deleted files are initially moved (renamed) to the Current 
> sub-directory of the .Trash directory with original path being preserved. 
> Files and directories in the trash can be restored simply by moving them to a 
> location outside the .Trash directory.
> Due to the limited rename support, delete sub-directory/file within 
> encryption zone with trash feature is not allowed. Client has to use 
> -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved 
> the error message but without a complete solution to the problem. 
> We propose to solve the problem by generalizing the mapping between 
> encryption zone and its underlying HDFS directories from 1:1 today to 1:N. 
> The encryption zone should allow non-overlapped directories such as scratch 
> space or soft delete "trash" locations to be added/removed dynamically after 
> creation. This way, rename for "scratch space" and "soft delete" can be 
> better supported without breaking the assumption that rename is only 
> supported "within the zone". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation

2015-08-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652536#comment-14652536
 ] 

Zhe Zhang commented on HDFS-8804:
-

Thanks Jing for the work! The patch looks good to me. The only minor comment is 
that the below section could use some assertions to avoid overlapped allocation 
in the {{parityBuf}}:
{code}
  ByteBuffer buf = getParityBuffer().duplicate();
  buf.position(cellSize * decodeIndex);
  buf.limit(cellSize * decodeIndex + (int) alignedStripe.range.spanInBlock);
  decodeInputs[decodeIndex] = buf.slice();
{code}

For example, since this is stateful read, we can at least assert 
{{alignedStripe.range.spanInBlock}} is no larger than {{cellSize}}. Ideally we 
should assert {{decodeIndex}} has not been allocated yet but it doesn't seem 
easy.

As follow-on we can think about how to do it for pread.

> Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer 
> allocation
> ---
>
> Key: HDFS-8804
> URL: https://issues.apache.org/jira/browse/HDFS-8804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch
>
>
> Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for 
> the stripe buffer and the buffers holding parity data. It's better to get 
> ByteBuffer from DirectBufferPool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652549#comment-14652549
 ] 

Andrew Wang commented on HDFS-7966:
---

I guess my question here is similar to what [~stack] and [~tlipcon] posed at 
the beginning. What's the upside of this new implementation? Seems like it's 
between 10 to 30% slower than the current implementation, which is not good. If 
it were the same performance but had other redeeming qualities (e.g. less code) 
then it's still worth consideration.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-08-03 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652559#comment-14652559
 ] 

Jing Zhao commented on HDFS-8828:
-

Thanks for working on this, Yufei! One quick comment is about the following 
change:
{code}
-if ((!syncFolder || !deleteMissing) && useDiff) {
+if ((!syncFolder || deleteMissing) && useDiff) {
   throw new IllegalArgumentException(
-  "Diff is valid only with update and delete options");
+  "Diff is valid only with update options");
 }
{code}

Currently we delete files/directories according to DELETE diff already. Looks 
to me this is consistent with the "deleteMissing" option actually. Any specific 
reason we want to change the semantic here?

> Utilize Snapshot diff report to build copy list in distcp
> -
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652579#comment-14652579
 ] 

Haohui Mai commented on HDFS-7966:
--

bq. What's the upside of this new implementation? 

Performance is definitely one important factor. One of the motivation is to 
improve the efficiency of DN when there are hundreds of thousands of reads by 
reducing the overhead of context switches. [~Apache9], do you have any 
performance numbers on this scenario?

HTTP/2-based DTP also serves as a building block of the next-level of 
innovation, just to quote the description in the jira:

{quote}
This jira explores to delegate the responsibilities of the session and 
presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
connection multiplexing, QoS, authentication and encryption, reducing the scope 
of DTP to the application layer only. By leveraging the existing HTTP/2 
library, it should simplify the implementation of both HDFS clients and servers.
{quote}

bq. If it were the same performance but had other redeeming qualities (e.g. 
less code) then it's still worth consideration.

This is designed to be a new code path so that it is compatible with older 
releases. You can still rely on the old DTP protocol depending on the 
application scenario.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8761) Windows HDFS daemon - datanode.DirectoryScanner: Error compiling report (...) XXX is not a prefix of YYY

2015-08-03 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652581#comment-14652581
 ] 

Chris Nauroth commented on HDFS-8761:
-

[~odelalleau], glad to hear this helped!

bq. I wonder how this is not a bug though, even if there exists a workaround. 
But not a big deal.

I agree that the configuration file can end up looking non-intuitive on 
Windows.  Unfortunately, I don't see a way to do any better while maintaining 
the feature that everything defaults to using {{hadoop.tmp.dir}} for quick dev 
deployments.  This is a side effect of the fact that a Windows file system path 
is not always valid as a URL.  On Linux, a file system path will always be a 
valid URL (assuming the individual path names stick to the characters that 
don't require escaping).  I typically advise using a full {{file:}} URL in 
production configurations to make everything clearer for operators.

> Windows HDFS daemon - datanode.DirectoryScanner: Error compiling report (...) 
> XXX is not a prefix of YYY
> 
>
> Key: HDFS-8761
> URL: https://issues.apache.org/jira/browse/HDFS-8761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
> Environment: Windows 7, Java SDK 1.8.0_45
>Reporter: Olivier Delalleau
>Priority: Minor
>
> I'm periodically seeing errors like the one below output by the HDFS daemon 
> (started with start-dfs.cmd). This is with the default settings for data 
> location (=not specified in my hdfs-site.xml). I assume it may be fixable by 
> specifying a path with the drive letter in the config file, however I haven't 
> be able to do that (see 
> http://stackoverflow.com/questions/31353226/setting-hadoop-tmp-dir-on-windows-gives-error-uri-has-an-authority-component).
> 15/07/11 17:29:57 ERROR datanode.DirectoryScanner: Error compiling report
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> \tmp\hadoop-odelalleau\dfs\data is not a prefix of 
> D:\tmp\hadoop-odelalleau\dfs\data\current\BP-1474392971-10.128.22.110-1436634926842\current\finalized\subdir0\subdir0\blk_1073741825
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:566)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:425)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:406)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:362)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652639#comment-14652639
 ] 

Hadoop QA commented on HDFS-8823:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 30s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 22s | The applied patch generated  5 
new checkstyle issues (total was 577, now 573). |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 34s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  3s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 159m 12s | Tests failed in hadoop-hdfs. |
| | | 202m 55s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748510/HDFS-8823.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 469cfcd |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11890/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11890/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11890/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11890/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11890/console |


This message was automatically generated.

> Move replication factor into individual blocks
> --
>
> Key: HDFS-8823
> URL: https://issues.apache.org/jira/browse/HDFS-8823
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-8823.000.patch, HDFS-8823.001.patch
>
>
> This jira proposes to record the replication factor in the {{BlockInfo}} 
> class. The changes have two advantages:
> * Decoupling the namespace and the block management layer. It is a 
> prerequisite step to move block management off the heap or to a separate 
> process.
> * Increased flexibility on replicating blocks. Currently the replication 
> factors of all blocks have to be the same. The replication factors of these 
> blocks are equal to the highest replication factor across all snapshots. The 
> changes will allow blocks in a file to have different replication factor, 
> potentially saving some space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652645#comment-14652645
 ] 

Hadoop QA commented on HDFS-8220:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 36s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 38s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 28s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 21s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 177m 12s | Tests failed in hadoop-hdfs. |
| | | 220m 24s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestWriteStripedFileWithFailure |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748509/HDFS-8220-HDFS-7285-09.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / ba90c02 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11889/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11889/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11889/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11889/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11889/console |


This message was automatically generated.

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.j

[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652657#comment-14652657
 ] 

Andrew Wang commented on HDFS-7966:
---

Agree there potentially are performance advantages, but it looks like all the 
benchmarks thus far show worse performance. I'd be very happy to see positive 
results, since erasure coding will lead to a lot more remote reads and thus 
possibly hitting this code path.

There has to be some upside though for this to be merged. The existing DTP 
already implements a number of the features mentioned, so not sure how much we 
gain there. And if perf isn't as good or better, then we're increasing our 
maintenance burden for something that won't get used.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8499) Refactor BlockInfo class hierarchy with static helper class

2015-08-03 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652684#comment-14652684
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8499:
---

Not yet.  Should be able to try it on Wednesday.

> Refactor BlockInfo class hierarchy with static helper class
> ---
>
> Key: HDFS-8499
> URL: https://issues.apache.org/jira/browse/HDFS-8499
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-8499.00.patch, HDFS-8499.01.patch, 
> HDFS-8499.02.patch, HDFS-8499.03.patch, HDFS-8499.04.patch, 
> HDFS-8499.05.patch, HDFS-8499.06.patch, HDFS-8499.07.patch, 
> HDFS-8499.UCFeature.patch, HDFS-bistriped.patch
>
>
> In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a 
> common abstraction for striped and contiguous UC blocks. This JIRA aims to 
> merge it to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652695#comment-14652695
 ] 

Zhe Zhang commented on HDFS-8849:
-

Thanks Allen for the advice. I think we can report the *number of missing 
blocks with min replication* instead.

> fsck should report number of missing blocks with replication factor 1
> -
>
> Key: HDFS-8849
> URL: https://issues.apache.org/jira/browse/HDFS-8849
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> HDFS-7165 supports reporting number of blocks with replication factor 1 in 
> {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
> support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652698#comment-14652698
 ] 

Duo Zhang commented on HDFS-7966:
-

I do not have enough machines to test the scenario... What I see if I create 
lots of thread to read from datanode concurrently is that HTTP/2 will start the 
request almost at the same time, but TCP will start the request one by 
one(maybe tens by tens where the number is cpu count). So there won't be a 
situation that DN really handle lots of concurrent read from client, and the 
context switch maybe small than HTTP/2 implementation since we also have a 
ThreadPool besides EventLoopGroup in HTTP/2 connection. And what make things 
worse is that our client is not event driven so we can not reduce the thread 
count of client...
Let me see if I can make a scenario that HTTP/2 fast than TCP...
Thanks.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652728#comment-14652728
 ] 

Hadoop QA commented on HDFS-8828:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 22s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 37s | The applied patch generated  2  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   6m 19s | Tests passed in 
hadoop-distcp. |
| | |  42m 32s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748535/HDFS-8828.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 469cfcd |
| javadoc | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/diffJavadocWarnings.txt
 |
| hadoop-distcp test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/testrun_hadoop-distcp.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11891/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11891/console |


This message was automatically generated.

> Utilize Snapshot diff report to build copy list in distcp
> -
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-08-03 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652736#comment-14652736
 ] 

Yufei Gu commented on HDFS-8828:


Hi Jing Zhao,

Thank you for reviewing the code. We changed the option here for the following 
reason. This patch is to build the diff file list instead of complete file 
list. In the other words, only files/directories changed/created will be in the 
copy file list. With the "-delete" option on, the MR jobs will delete every 
files/directories in the target which are not in the copy file list. So it will 
delete files we intend to keep. 

> Utilize Snapshot diff report to build copy list in distcp
> -
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-08-03 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652789#comment-14652789
 ] 

Jing Zhao commented on HDFS-8828:
-

Thanks for the explanation, Yufei! Yes, you're right that our current code uses 
the file list to check if a file is in the source. In that sense excluding 
"-delete" may be our only option here. But we may need to provide more details 
in the documentation about the behavior, as also suggested by Yongjun.

> Utilize Snapshot diff report to build copy list in distcp
> -
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation

2015-08-03 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652808#comment-14652808
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8804:
---

+1 the new patch looks good.

> Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer 
> allocation
> ---
>
> Key: HDFS-8804
> URL: https://issues.apache.org/jira/browse/HDFS-8804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch
>
>
> Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for 
> the stripe buffer and the buffers holding parity data. It's better to get 
> ByteBuffer from DirectBufferPool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation

2015-08-03 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-8804.
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7285

I've committed this to the feature branch. Thank you guys for the review!

bq. we can at least assert alignedStripe.range.spanInBlock is no larger than 
cellSize

This is guaranteed by the logic in {{readOneStripe}}. Thus my feeling here is 
the assertion is unnecessary. Also we don't have this assertion for data block 
buffer.

> Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer 
> allocation
> ---
>
> Key: HDFS-8804
> URL: https://issues.apache.org/jira/browse/HDFS-8804
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: HDFS-7285
>
> Attachments: HDFS-8804.000.patch, HDFS-8804.001.patch
>
>
> Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for 
> the stripe buffer and the buffers holding parity data. It's better to get 
> ByteBuffer from DirectBufferPool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks

2015-08-03 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-8850:
--

 Summary: VolumeScanner thread exits with exception if there is no 
block pool to be scanned but there are suspicious blocks
 Key: HDFS-8850
 URL: https://issues.apache.org/jira/browse/HDFS-8850
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


The VolumeScanner threads inside the BlockScanner exit with an exception if 
there is no block pool to be scanned but there are suspicious blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks

2015-08-03 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8850:
---
Attachment: HDFS-8850.001.patch

> VolumeScanner thread exits with exception if there is no block pool to be 
> scanned but there are suspicious blocks
> -
>
> Key: HDFS-8850
> URL: https://issues.apache.org/jira/browse/HDFS-8850
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8850.001.patch
>
>
> The VolumeScanner threads inside the BlockScanner exit with an exception if 
> there is no block pool to be scanned but there are suspicious blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks

2015-08-03 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8850:
---
Status: Patch Available  (was: Open)

> VolumeScanner thread exits with exception if there is no block pool to be 
> scanned but there are suspicious blocks
> -
>
> Key: HDFS-8850
> URL: https://issues.apache.org/jira/browse/HDFS-8850
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8850.001.patch
>
>
> The VolumeScanner threads inside the BlockScanner exit with an exception if 
> there is no block pool to be scanned but there are suspicious blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks

2015-08-03 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652932#comment-14652932
 ] 

Yi Liu commented on HDFS-8850:
--

Yes, you are right.  +1 pending Jenkins.

> VolumeScanner thread exits with exception if there is no block pool to be 
> scanned but there are suspicious blocks
> -
>
> Key: HDFS-8850
> URL: https://issues.apache.org/jira/browse/HDFS-8850
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8850.001.patch
>
>
> The VolumeScanner threads inside the BlockScanner exit with an exception if 
> there is no block pool to be scanned but there are suspicious blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command

2015-08-03 Thread Steven Capo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Capo updated HDFS-488:
-
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.1
Affects Version/s: 2.7.1
 Target Version/s: 2.7.1
 Tags: MoveToLocal
   Status: Patch Available  (was: Open)

> Implement moveToLocal  HDFS command
> ---
>
> Key: HDFS-488
> URL: https://issues.apache.org/jira/browse/HDFS-488
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Ravi Phulari
>Assignee: Steven Capo
>  Labels: newbie
> Fix For: 2.7.1
>
> Attachments: Screen Shot 2014-07-23 at 12.28.23 PM 1.png
>
>
> Surprisingly  executing HDFS FsShell command -moveToLocal outputs  - "Option 
> '-moveToLocal' is not implemented yet."
>  
> {code}
> statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t
> Option '-moveToLocal' is not implemented yet.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command

2015-08-03 Thread Steven Capo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Capo updated HDFS-488:
-
Attachment: HDFS-488.patch

> Implement moveToLocal  HDFS command
> ---
>
> Key: HDFS-488
> URL: https://issues.apache.org/jira/browse/HDFS-488
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Ravi Phulari
>Assignee: Steven Capo
>  Labels: newbie
> Fix For: 2.7.1
>
> Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 
> 1.png
>
>
> Surprisingly  executing HDFS FsShell command -moveToLocal outputs  - "Option 
> '-moveToLocal' is not implemented yet."
>  
> {code}
> statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t
> Option '-moveToLocal' is not implemented yet.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652970#comment-14652970
 ] 

Allen Wittenauer commented on HDFS-8849:


This is one of those times where I feel that no matter what I say, it's pretty 
clear the dev is hell bent on putting in some useless feature that doesn't 
actually benefit anyone.  

That said, I'll also remind you that putting this into 2.x is a breaking change 
by the compatibility requirements since changing the output of fsck isn't 
allowed.

> fsck should report number of missing blocks with replication factor 1
> -
>
> Key: HDFS-8849
> URL: https://issues.apache.org/jira/browse/HDFS-8849
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> HDFS-7165 supports reporting number of blocks with replication factor 1 in 
> {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
> support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command

2015-08-03 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-488:
--
 Priority: Minor  (was: Major)
 Hadoop Flags:   (was: Reviewed)
Fix Version/s: (was: 2.7.1)

> Implement moveToLocal  HDFS command
> ---
>
> Key: HDFS-488
> URL: https://issues.apache.org/jira/browse/HDFS-488
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Ravi Phulari
>Assignee: Steven Capo
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 
> 1.png
>
>
> Surprisingly  executing HDFS FsShell command -moveToLocal outputs  - "Option 
> '-moveToLocal' is not implemented yet."
>  
> {code}
> statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t
> Option '-moveToLocal' is not implemented yet.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command

2015-08-03 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-488:
--
Target Version/s:   (was: 2.7.1)

> Implement moveToLocal  HDFS command
> ---
>
> Key: HDFS-488
> URL: https://issues.apache.org/jira/browse/HDFS-488
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Ravi Phulari
>Assignee: Steven Capo
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 
> 1.png
>
>
> Surprisingly  executing HDFS FsShell command -moveToLocal outputs  - "Option 
> '-moveToLocal' is not implemented yet."
>  
> {code}
> statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t
> Option '-moveToLocal' is not implemented yet.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-488) Implement moveToLocal HDFS command

2015-08-03 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-488:
--
Tags:   (was: MoveToLocal)

> Implement moveToLocal  HDFS command
> ---
>
> Key: HDFS-488
> URL: https://issues.apache.org/jira/browse/HDFS-488
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Ravi Phulari
>Assignee: Steven Capo
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 
> 1.png
>
>
> Surprisingly  executing HDFS FsShell command -moveToLocal outputs  - "Option 
> '-moveToLocal' is not implemented yet."
>  
> {code}
> statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t
> Option '-moveToLocal' is not implemented yet.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-488) Implement moveToLocal HDFS command

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652995#comment-14652995
 ] 

Hadoop QA commented on HDFS-488:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | javac |   0m 32s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748586/HDFS-488.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c3364ca |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11893/console |


This message was automatically generated.

> Implement moveToLocal  HDFS command
> ---
>
> Key: HDFS-488
> URL: https://issues.apache.org/jira/browse/HDFS-488
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Ravi Phulari
>Assignee: Steven Capo
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-488.patch, Screen Shot 2014-07-23 at 12.28.23 PM 
> 1.png
>
>
> Surprisingly  executing HDFS FsShell command -moveToLocal outputs  - "Option 
> '-moveToLocal' is not implemented yet."
>  
> {code}
> statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t
> Option '-moveToLocal' is not implemented yet.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby

2015-08-03 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653015#comment-14653015
 ] 

Ajith S commented on HDFS-8808:
---

Hi [~ggop]

Why not bootstrap the standby without that property and when its complete, 
before starting the standby you add dfs.image.tranfer.bandwidthPerSec

> dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
> 
>
> Key: HDFS-8808
> URL: https://issues.apache.org/jira/browse/HDFS-8808
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Gautam Gopalakrishnan
>
> The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the 
> speed with which the fsimage is copied between the namenodes during regular 
> use. However, as a side effect, this also limits transfers when the 
> {{-bootstrapStandby}} option is used. This option is often used during 
> upgrades and could potentially slow down the entire workflow. The request 
> here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth 
> setting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653034#comment-14653034
 ] 

Walter Su commented on HDFS-8220:
-

When I ran tests, I ran into some NPEs. Could you add {{si.isFailed()}} guard 
to {{updateBlockForPipeline}} and {{updatePipeline}} as well?

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails

2015-08-03 Thread Li Bo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8704:

Status: Patch Available  (was: Open)

> Erasure Coding: client fails to write large file when one datanode fails
> 
>
> Key: HDFS-8704
> URL: https://issues.apache.org/jira/browse/HDFS-8704
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
> HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
> corrupt, client succeeds to write a file smaller than a block group but fails 
> to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
> files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8851) datanode fails to start due to a bad disk

2015-08-03 Thread Wang Hao (JIRA)
Wang Hao created HDFS-8851:
--

 Summary: datanode fails to start due to a bad disk
 Key: HDFS-8851
 URL: https://issues.apache.org/jira/browse/HDFS-8851
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.5.1
Reporter: Wang Hao


Data node can not start due to a bad disk. I found a similar issue HDFS-6245 is 
reported, but our situation is different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails

2015-08-03 Thread Li Bo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8704:

Attachment: HDFS-8704-HDFS-7285-004.patch

> Erasure Coding: client fails to write large file when one datanode fails
> 
>
> Key: HDFS-8704
> URL: https://issues.apache.org/jira/browse/HDFS-8704
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
> HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
> corrupt, client succeeds to write a file smaller than a block group but fails 
> to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
> files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8851) datanode fails to start due to a bad disk

2015-08-03 Thread Wang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653048#comment-14653048
 ] 

Wang Hao commented on HDFS-8851:



15/08/04 12:01:24 INFO common.Storage: Analyzing storage directories for bpid 
BP-454299492-10.84.100.171-1416301904728
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Locking is disabled
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 INFO common.Storage: Restored 0 block files from trash.
15/08/04 12:01:24 FATAL datanode.DataNode: Initialization failed for Block pool 
 (Datanode Uuid unassigned) service to 
hadoop001.dx.momo.com/10.84.100.171:8022. Exiting.
java.io.IOException: Input/output error
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:243)
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at 
org.apache.hadoop.hdfs.server.common.StorageInfo.readPropertiesFile(StorageInfo.java:247)
at 
org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:227)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:256)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:155)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:269)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:975)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:946)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:278)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:812)
at java.lang.Thread.run(Thread.java:745)
15/08/04 12:01:24 WARN datanode.DataNode: Ending block pool service for: Block 
pool  (Datanode Uuid unassigned) service to 
hadoop001.dx.momo.com/10.84.100.171:8022
15/08/04 12:01:24 INFO datanode.DataNode: Removed Block pool  
(Datanode Uuid unassigned)
15/08/04 12:01:26 WARN datanode.DataNode: Exiting Datanode
15/08/04 12:01:26 INFO util.ExitUtil: Exiting with status 0
15/08/04 12:01:26 INFO datanode.DataNode: SHUTDOWN_MSG:


> datanode fails to start due to a bad disk
> -
>
> Key: HDFS-8851
> URL: https://issues.apache.org/jira/browse/HDFS-8851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.1
>Reporter: Wang Hao
>
> Data node can not start due to a bad disk. I found a similar issue HDFS-6245 
> is reported, but our situation is different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8851) datanode fails to start due to a bad disk

2015-08-03 Thread Wang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653056#comment-14653056
 ] 

Wang Hao commented on HDFS-8851:


There is a IOException when read VERSION because of the disk is bad, it will 
causes datanode failed to start. I think we should handle the exception during 
init storage.

> datanode fails to start due to a bad disk
> -
>
> Key: HDFS-8851
> URL: https://issues.apache.org/jira/browse/HDFS-8851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.1
>Reporter: Wang Hao
>
> Data node can not start due to a bad disk. I found a similar issue HDFS-6245 
> is reported, but our situation is different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8827) Erasure Coding: When namenode processes over replicated striped block, NPE will be occur in ReplicationMonitor

2015-08-03 Thread Takuya Fukudome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Fukudome updated HDFS-8827:
--
Attachment: HDFS-8827.1.patch

Thanks for the comment, [~zhz]! I attached an initial patch which added a unit 
test occurs this issue. It processes a small EC file which doesn't have full 
internal blocks and its internal blocks are over replicated.
If I understood correctly, when some indices of internal blocks are missing and 
internal blocks are over replicated, 
{{BlockPlacementPolicyDefault#chooseReplicaToDelete}} will return null. I think 
the cause is the {{excessTypes}} in {{chooseExcessReplicasStriped}} is empty 
during the process of such blocks.

> Erasure Coding: When namenode processes over replicated striped block, NPE 
> will be occur in ReplicationMonitor
> --
>
> Key: HDFS-8827
> URL: https://issues.apache.org/jira/browse/HDFS-8827
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takuya Fukudome
>Assignee: Takuya Fukudome
> Attachments: HDFS-8827.1.patch, processing-over-replica-npe.log
>
>
> In our test cluster, when namenode processed over replicated striped blocks, 
> null pointer exception(NPE) occurred. This happened under below situation: 1) 
> some datanodes shutdown. 2) namenode recovers block group which lost internal 
> blocks. 3) restart the stopped datanodes. 4) namenode processes over 
> replicated striped blocks. 5) NPE occurs
> I think BlockPlacementPolicyDefault#chooseReplicaToDelete will return null in 
> this situation which causes this NPE problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653076#comment-14653076
 ] 

Hadoop QA commented on HDFS-8850:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 11s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 20s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  2s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 158m 28s | Tests failed in hadoop-hdfs. |
| | | 202m 14s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748576/HDFS-8850.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c3364ca |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11892/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11892/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11892/console |


This message was automatically generated.

> VolumeScanner thread exits with exception if there is no block pool to be 
> scanned but there are suspicious blocks
> -
>
> Key: HDFS-8850
> URL: https://issues.apache.org/jira/browse/HDFS-8850
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8850.001.patch
>
>
> The VolumeScanner threads inside the BlockScanner exit with an exception if 
> there is no block pool to be scanned but there are suspicious blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8852) Documentation of Hadoop 2.x is outdated about append write support

2015-08-03 Thread Hong Dai Thanh (JIRA)
Hong Dai Thanh created HDFS-8852:


 Summary: Documentation of Hadoop 2.x is outdated about append 
write support
 Key: HDFS-8852
 URL: https://issues.apache.org/jira/browse/HDFS-8852
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Hong Dai Thanh


In the [latest version of the 
documentation|http://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Simple_Coherency_Model],
 and also documentation for all releases with version 2, it’s mentioned that “A 
file once created, written, and closed need not be changed. “ and “There is a 
plan to support appending-writes to files in the future.” 
 
However, as far as I know, HDFS has supported append write since 0.21, based on 
[HDFS-265|https://issues.apache.org/jira/browse/HDFS-265] and [the old version 
of the documentation in 
2012|https://web.archive.org/web/20121221171824/http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html#Appending-Writes+and+File+Syncs]

Various posts on the Internet also suggests that append write has been 
available in HDFS, and will always be available in Hadoop version 2 branch.
 
Can we update the documentation to reflect the current status?

(Please also review whether the documentation should also be updated for 
version 0.21 and above, and the version 1.x branch)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8852) HDFS architecture documentation of version 2.x is outdated about append write support

2015-08-03 Thread Hong Dai Thanh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Dai Thanh updated HDFS-8852:
-
Summary: HDFS architecture documentation of version 2.x is outdated about 
append write support  (was: Documentation of Hadoop 2.x is outdated about 
append write support)

> HDFS architecture documentation of version 2.x is outdated about append write 
> support
> -
>
> Key: HDFS-8852
> URL: https://issues.apache.org/jira/browse/HDFS-8852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Hong Dai Thanh
>
> In the [latest version of the 
> documentation|http://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Simple_Coherency_Model],
>  and also documentation for all releases with version 2, it’s mentioned that 
> “A file once created, written, and closed need not be changed. “ and “There 
> is a plan to support appending-writes to files in the future.” 
>  
> However, as far as I know, HDFS has supported append write since 0.21, based 
> on [HDFS-265|https://issues.apache.org/jira/browse/HDFS-265] and [the old 
> version of the documentation in 
> 2012|https://web.archive.org/web/20121221171824/http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html#Appending-Writes+and+File+Syncs]
> Various posts on the Internet also suggests that append write has been 
> available in HDFS, and will always be available in Hadoop version 2 branch.
>  
> Can we update the documentation to reflect the current status?
> (Please also review whether the documentation should also be updated for 
> version 0.21 and above, and the version 1.x branch)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8663) sys cpu usage high on namenode server

2015-08-03 Thread tangjunjie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653114#comment-14653114
 ] 

tangjunjie commented on HDFS-8663:
--

For HDFS, the mapping of users to groups is performed on the NameNode. Thus, 
the host system configuration of the NameNode determines the group mappings for 
the users. So  user should create user on NameNode. More detail info can be 
found on
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Group_Mapping

> sys cpu usage high on namenode server
> -
>
> Key: HDFS-8663
> URL: https://issues.apache.org/jira/browse/HDFS-8663
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, namenode
>Affects Versions: 2.3.0
> Environment: hadoop 2.3.0 centos5.8
>Reporter: tangjunjie
>
> sys cpu usage high  on namenode  server lead to run job very slow.
> I use ps -elf see many zombie process.
> I check hdfs log I found many exceptions like:
> org.apache.hadoop.util.Shell$ExitCodeException: id: sem_410: No such user
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>   at org.apache.hadoop.util.Shell.run(Shell.java:418)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
>   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:81)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3310)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3491)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> Then I create all user such as sem_410 appear in exception.Then the sys cpu 
> usage on namenode down.
> BTW, my hadoop 2.3.0 enaable hadoop acl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails

2015-08-03 Thread Li Bo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8704:

Attachment: HDFS-8704-HDFS-7285-005.patch

> Erasure Coding: client fails to write large file when one datanode fails
> 
>
> Key: HDFS-8704
> URL: https://issues.apache.org/jira/browse/HDFS-8704
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
> HDFS-8704-HDFS-7285-003.patch, HDFS-8704-HDFS-7285-004.patch, 
> HDFS-8704-HDFS-7285-005.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
> corrupt, client succeeds to write a file smaller than a block group but fails 
> to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
> files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone

2015-08-03 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8853:
--

 Summary: Erasure Coding: Provide ECSchema validation when creating 
ECZone
 Key: HDFS-8853
 URL: https://issues.apache.org/jira/browse/HDFS-8853
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R


Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} doesn't 
have any validation that the given {{ecSchema}} is available in 
{{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists 
then will create the ECZone with {{null}} schema. IMHO we could improve this by 
doing necessary basic sanity checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone

2015-08-03 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reassigned HDFS-8853:
--

Assignee: Rakesh R

> Erasure Coding: Provide ECSchema validation when creating ECZone
> 
>
> Key: HDFS-8853
> URL: https://issues.apache.org/jira/browse/HDFS-8853
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>
> Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} 
> doesn't have any validation that the given {{ecSchema}} is available in 
> {{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists 
> then will create the ECZone with {{null}} schema. IMHO we could improve this 
> by doing necessary basic sanity checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone

2015-08-03 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8853:
---
Assignee: J.Andreina  (was: Rakesh R)

> Erasure Coding: Provide ECSchema validation when creating ECZone
> 
>
> Key: HDFS-8853
> URL: https://issues.apache.org/jira/browse/HDFS-8853
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: J.Andreina
>
> Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} 
> doesn't have any validation that the given {{ecSchema}} is available in 
> {{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists 
> then will create the ECZone with {{null}} schema. IMHO we could improve this 
> by doing necessary basic sanity checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-03 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653136#comment-14653136
 ] 

Li Bo commented on HDFS-8838:
-

hi, [~walter.k.su] and [~szetszwo], could you help me review the patch of 
HDFS-8704 if you have time? 

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8838_20150729.patch, h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2015-08-03 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653147#comment-14653147
 ] 

Ajith S commented on HDFS-8693:
---

Hi [~kihwal]

I tested with federated HA cluster when adding a new nameservice, the command 
works. Is there any special scenario when you said it doesn't work for 
federated HA cluster.?

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Priority: Critical
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-03 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8838:

Attachment: h8838_20150731-HDFS-7285.patch

LGTM. +1. Upload the same patch for [~szetszwo] to trigger Jenkins.

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, 
> h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-03 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8838:

Status: Patch Available  (was: Open)

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, 
> h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2015-08-03 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned HDFS-8693:
-

Assignee: Ajith S

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8220:
---
Attachment: HDFS-8220-HDFS-7285-10.patch

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, 
> HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2015-08-03 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653150#comment-14653150
 ] 

Ajith S commented on HDFS-8693:
---

Hi [~john.jian.fang] and [~kihwal]

Agreed, need to fix refreshNameNodes. In refreshNNList, can we just add a new 
NN actor and replace the old NN actor in block pool service.?? 
I would like to work on this issue :)

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Priority: Critical
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-03 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653155#comment-14653155
 ] 

Rakesh R commented on HDFS-8220:


Thanks [~szetszwo], [~walter.k.su]. Attached another patch addressing the 
comments.

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, 
> HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

2015-08-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653161#comment-14653161
 ] 

Zhe Zhang commented on HDFS-8833:
-

Thanks Andrew for the suggestion. I think {{*-on-create}} flags is a good 
solution to the future compatibility concern.

[~walter.k.su] also mentioned some ideas in an offline discussion.

To summarize, below is the proposed design for this change:

# ErasureCodingPolicy table
#* Create {{ErasureCodingPolicy}} class, with {{ECSchema}} and {{cellSize}}.
#* Create {{ErasureCodingPolicySuite}} class to manage a table of supported 
policies (or extend {{ErasureCodingSchemaManager}}). Something like:
{code}
0: RS-6-3 (schema), 64KB (cellSize)
1: RS-6-3 (schema), 128KB (cellSize)
2: RS-10-4 (schema), 1MB (cellSize)
{code}
#* [follow-on] Allow customize policies stored in XAttr
# File header change
#* Remove {{isStriped}} from {{INodeFile}} header and reduce replication factor 
to 6 bits.
{code}
  /** 
   * Bit format:
   * [4-bit storagePolicyID][6-bit erasureCodingPolicy]
   * [6-bit replication][48-bit preferredBlockSize]
   */
{code}
#* Store ID of ECPolicy with 6 bits in header -- 64 policies allowed
#* The ECPolicy is *always set* when creating a file, taking value from its 
ancestors; {{0}} can be used to represent contiguous layout.
#* [follow-on] Add {{inherit-on-create}} flag as Andrew suggested above
# Directory XAttr change
#* Directory's ECPolicy XAttr can be empty, indicating the ECPolicy is the same 
as ancestor. Otherwise its own XAttr determines the policy for newly created 
files under the directory.
# Renaming
#* A renamed file keeps the ECPolicy in its header.
#* Therefore, a directory can have files with different ECPolicies.
#* Conversion not explicitly support. If needed a file can be converted by 
cp+rm.
#* When renamed, a directory carries over its ECPolicy if it's set (XAttr 
non-empty). Otherwise its XAttr remains empty (and newly created files under 
the moved directory will use policy from the new ancestors). 

Questions / comments are very welcome.

> Erasure coding: store EC schema and cell size in INodeFile and eliminate 
> notion of EC zones
> ---
>
> Key: HDFS-8833
> URL: https://issues.apache.org/jira/browse/HDFS-8833
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> We have [discussed | 
> https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
>  storing EC schema with files instead of EC zones and recently revisited the 
> discussion under HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and 
> nested configuration. Those limitations are valid in encryption for security 
> reasons and it doesn't make sense to carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
> simplicity, we should first implement it as an xattr and consider memory 
> optimizations (such as moving it to file header) as a follow-on. We should 
> also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653156#comment-14653156
 ] 

Hadoop QA commented on HDFS-8838:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748613/h8838_20150731-HDFS-7285.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c3364ca |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11896/console |


This message was automatically generated.

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, 
> h8838_20150731.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8829) DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning

2015-08-03 Thread kanaka kumar avvaru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653164#comment-14653164
 ] 

kanaka kumar avvaru commented on HDFS-8829:
---

Hi [~He Tianyi], similar configuration changes we have added in our cluster 
too. If you have patch available, feel free to assign to you and submit the 
patch.

> DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning
> ---
>
> Key: HDFS-8829
> URL: https://issues.apache.org/jira/browse/HDFS-8829
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.3.0, 2.6.0
>Reporter: He Tianyi
>Assignee: kanaka kumar avvaru
>
> {code:java}
>   private void initDataXceiver(Configuration conf) throws IOException {
> // find free port or use privileged port provided
> TcpPeerServer tcpPeerServer;
> if (secureResources != null) {
>   tcpPeerServer = new TcpPeerServer(secureResources);
> } else {
>   tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,
>   DataNode.getStreamingAddr(conf));
> }
> 
> tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
> {code}
> The last line sets SO_RCVBUF explicitly, thus disabling tcp auto-tuning on 
> some system.
> Shall we make this behavior configurable?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)