date:20151215

[jira] [Commented] (HDFS-9516) truncate file fails with data dirs on multiple disks

2015-12-15 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057550#comment-15057550
 ] 

Konstantin Shvachko commented on HDFS-9516:
---

+1 on the latest patch.
The Jenkins report is quite confusing. Don't know if there is any value in 
running it. Still
- No new tests, because existing tests should fail due to the new assert 
statement.
- checkstyle issues seems to be the same 123
- Failed tests are reported incorrectly in the jira. Ran 8 failed tests 
locally, no problems.
- No new files in the patch, so ASF warnings are not related.

Will commit shortly.
Also should we target it for any of the upcoming releases? Seems like a 
critical bug.

> truncate file fails with data dirs on multiple disks
> 
>
> Key: HDFS-9516
> URL: https://issues.apache.org/jira/browse/HDFS-9516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, 
> HDFS-9516_testFailures.patch, Main.java, truncate.dn.log
>
>
> FileSystem.truncate returns false (no exception) but the file is never closed 
> and not writable after this.
> It seems to be because of copy on truncate which is used because the system 
> is in upgrade state. In this case a rename between devices is attempted.
> See attached log and repro code.
> Probably also affects truncate snapshotted file when copy on truncate is also 
> used.
> Possibly it affects not only truncate but any block recovery.
> I think the problem is in updateReplicaUnderRecovery
> {code}
> ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten(
> newBlockId, recoveryId, rur.getVolume(), 
> blockFile.getParentFile(),
> newlength);
> {code}
> blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to 
> choose any volume so rur.getVolume() is not where the block is located.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks

2015-12-15 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-9516:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch-2. Thank you Plamen.

> truncate file fails with data dirs on multiple disks
> 
>
> Key: HDFS-9516
> URL: https://issues.apache.org/jira/browse/HDFS-9516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Plamen Jeliazkov
> Fix For: 2.9.0
>
> Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, 
> HDFS-9516_testFailures.patch, Main.java, truncate.dn.log
>
>
> FileSystem.truncate returns false (no exception) but the file is never closed 
> and not writable after this.
> It seems to be because of copy on truncate which is used because the system 
> is in upgrade state. In this case a rename between devices is attempted.
> See attached log and repro code.
> Probably also affects truncate snapshotted file when copy on truncate is also 
> used.
> Possibly it affects not only truncate but any block recovery.
> I think the problem is in updateReplicaUnderRecovery
> {code}
> ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten(
> newBlockId, recoveryId, rur.getVolume(), 
> blockFile.getParentFile(),
> newlength);
> {code}
> blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to 
> choose any volume so rur.getVolume() is not where the block is located.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9516) truncate file fails with data dirs on multiple disks

2015-12-15 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057729#comment-15057729
 ] 

Hudson commented on HDFS-9516:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8968 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8968/])
HDFS-9516. Truncate file fails with data dirs on multiple disks. (shv: rev 
96d307e1e320eafb470faf7bd47af3341c399d55)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> truncate file fails with data dirs on multiple disks
> 
>
> Key: HDFS-9516
> URL: https://issues.apache.org/jira/browse/HDFS-9516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Plamen Jeliazkov
> Fix For: 2.9.0
>
> Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, 
> HDFS-9516_testFailures.patch, Main.java, truncate.dn.log
>
>
> FileSystem.truncate returns false (no exception) but the file is never closed 
> and not writable after this.
> It seems to be because of copy on truncate which is used because the system 
> is in upgrade state. In this case a rename between devices is attempted.
> See attached log and repro code.
> Probably also affects truncate snapshotted file when copy on truncate is also 
> used.
> Possibly it affects not only truncate but any block recovery.
> I think the problem is in updateReplicaUnderRecovery
> {code}
> ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten(
> newBlockId, recoveryId, rur.getVolume(), 
> blockFile.getParentFile(),
> newlength);
> {code}
> blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to 
> choose any volume so rur.getVolume() is not where the block is located.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057763#comment-15057763
 ] 

Hadoop QA commented on HDFS-9494:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 43s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777682/HDFS-9494-origin-trunk.02.patch
 |
| JIRA Issue | HDFS-9494 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux da0c9ad7d29a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/prec

[jira] [Commented] (HDFS-9393) After choosing favored nodes, choosing nodes for remaining replicas should go through BlockPlacementPolicy

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057939#comment-15057939
 ] 

Hadoop QA commented on HDFS-9393:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 6s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 54s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 140m 27s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBlockReplacement |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777687/HDFS-9393.2.patch |
| JIRA Issue | HDFS-9393 |
| Optional Tests |

[jira] [Commented] (HDFS-9371) Code cleanup for DatanodeManager

2015-12-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058026#comment-15058026
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9371:
---

There are some empty lines added to BlockReportLeaseManager.  Patch looks good 
other than that.
+1

> Code cleanup for DatanodeManager
> 
>
> Key: HDFS-9371
> URL: https://issues.apache.org/jira/browse/HDFS-9371
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-9371.000.patch, HDFS-9371.001.patch, 
> HDFS-9371.002.patch, HDFS-9371.003.patch, HDFS-9371.004.patch
>
>
> Some code cleanup for DatanodeManager. The main changes include:
> # make the synchronization of {{datanodeMap}} and 
> {{datanodesSoftwareVersions}} consistent
> # remove unnecessary lock in {{handleHeartbeat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058031#comment-15058031
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8578:
---

There are a lot of test failures in the previous report.  However, if we click 
the link https://builds.apache.org/job/PreCommit-HDFS-Build/13854/testReport/ , 
there are only two tests failed.

[~aw], do you know why?

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, 
> HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, 
> HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, 
> HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, 
> h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, 
> h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058060#comment-15058060
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9494:
---

+1 the new patch looks good.

(Need to board a flight now.  Will commit later.)

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9516) truncate file fails with data dirs on multiple disks

2015-12-15 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058065#comment-15058065
 ] 

Hudson commented on HDFS-9516:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #694 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/694/])
HDFS-9516. Truncate file fails with data dirs on multiple disks. (shv: rev 
96d307e1e320eafb470faf7bd47af3341c399d55)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> truncate file fails with data dirs on multiple disks
> 
>
> Key: HDFS-9516
> URL: https://issues.apache.org/jira/browse/HDFS-9516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Bogdan Raducanu
>Assignee: Plamen Jeliazkov
> Fix For: 2.9.0
>
> Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, 
> HDFS-9516_testFailures.patch, Main.java, truncate.dn.log
>
>
> FileSystem.truncate returns false (no exception) but the file is never closed 
> and not writable after this.
> It seems to be because of copy on truncate which is used because the system 
> is in upgrade state. In this case a rename between devices is attempted.
> See attached log and repro code.
> Probably also affects truncate snapshotted file when copy on truncate is also 
> used.
> Possibly it affects not only truncate but any block recovery.
> I think the problem is in updateReplicaUnderRecovery
> {code}
> ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten(
> newBlockId, recoveryId, rur.getVolume(), 
> blockFile.getParentFile(),
> newlength);
> {code}
> blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to 
> choose any volume so rur.getVolume() is not where the block is located.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7661) Support read when a EC file is being written

2015-12-15 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058171#comment-15058171
 ] 

Vinayakumar B commented on HDFS-7661:
-

bq Vinayakumar B are you still working on it?
Please go ahead and merge together. Thanks for taking this up [~demongaorui].

> Support read when a EC file is being written
> 
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2015-12-15 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058212#comment-15058212
 ] 

Daryn Sharp commented on HDFS-8674:
---

[~mingma], comments?

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN

2015-12-15 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058216#comment-15058216
 ] 

Daryn Sharp commented on HDFS-9198:
---

Comments?  Been running successfully with this feature since early Nov.  Will 
update patch if necessary if it's currently acceptable.  Seem to recall test 
failures were general build instability.

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7964) Add support for async edit logging

2015-12-15 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058219#comment-15058219
 ] 

Daryn Sharp commented on HDFS-7964:
---

Comments?

> Add support for async edit logging
> --
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058240#comment-15058240
 ] 

James Clampffer commented on HDFS-9448:
---

Other than needing a minor rebase (just the CMakelists) this looks good to me 
and I'd like to get it in later today.  [~aw] please let me know if anything is 
obviously broken/against convention to you; if I commit before then I'll file a 
follow up jira to get things in line with what you think it should be.

+1 after rebase.

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058245#comment-15058245
 ] 

Allen Wittenauer commented on HDFS-9448:


Sorry, I've been busy with the Yetus 0.1.0 release.  I'll take a look at this 
today.

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9524) libhdfs++ deadlocks in Filesystem::New if NN conneciton fails

2015-12-15 Thread Bob Hansen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9524:
-
Attachment: HDFS-9524.HDFS-8707.002.patch

> libhdfs++ deadlocks in Filesystem::New if NN conneciton fails
> -
>
> Key: HDFS-9524
> URL: https://issues.apache.org/jira/browse/HDFS-9524
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9524.HDFS-8707.000.patch, 
> HDFS-9524.HDFS-8707.001.patch, HDFS-9524.HDFS-8707.002.patch
>
>
> FileSystem::New attempts to free the new FileSystem if the connection fails.  
> Unfortunately, it's in the middle of a callback from the filesystem's 
> threadpool, and attempts to join the worker thread while running the worker 
> thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9524) libhdfs++ deadlocks in Filesystem::New if NN conneciton fails

2015-12-15 Thread Bob Hansen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058251#comment-15058251
 ] 

Bob Hansen commented on HDFS-9524:
--

New patch to address your comments.

bq. As far as I can tell you'd have the same problem if you deleted the 
FileSystem in the FileSystem::Connect callback on a failed connect. Maybe it's 
worth having a rule/comment about deleting the filesystem from within the 
context of a callback?

That's a good suggestion.  I added comments to the public interface.  I also 
spent some time thinking about how I would want to respond at run-time to 
detecting the error, and the current default behavior of dump core for easy 
debugging and terminate the app is probably the best one.  I added some output 
to stderr to make it even more explicit what the error case is.

> libhdfs++ deadlocks in Filesystem::New if NN conneciton fails
> -
>
> Key: HDFS-9524
> URL: https://issues.apache.org/jira/browse/HDFS-9524
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9524.HDFS-8707.000.patch, 
> HDFS-9524.HDFS-8707.001.patch, HDFS-9524.HDFS-8707.002.patch
>
>
> FileSystem::New attempts to free the new FileSystem if the connection fails.  
> Unfortunately, it's in the middle of a callback from the filesystem's 
> threadpool, and attempts to join the worker thread while running the worker 
> thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9538) libhdfs++: load configuration from files

2015-12-15 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058272#comment-15058272
 ] 

James Clampffer commented on HDFS-9538:
---

Committed to HDFS-8707.  Thanks for the contribution [~bobthansen]!

> libhdfs++: load configuration from files
> 
>
> Key: HDFS-9538
> URL: https://issues.apache.org/jira/browse/HDFS-9538
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9538.HDFS-8707.003.patch, 
> HDFS-9538.HDFS-9537.000.patch, HDFS-9538.HDFS-9537.001.patch, 
> HDFS-9538.HDFS-9537.002.patch
>
>
> One goal of the Configuration classes are to allow the consumers of the 
> libhdfs++ library to deploy client applications into hadoop edge nodes and 
> have them pick up the Hadoop configuration that has been deployed there.
> Note that we also need to support the use case where the consumer application 
> will manage Hadoop configuration files itself, or will handle all 
> configuration out-of-band.
> libhdfs++ should be able to read files that are found in the field and easily 
> construct an instance that will communicate with the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9538) libhdfs++: load configuration from files

2015-12-15 Thread James Clampffer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9538:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> libhdfs++: load configuration from files
> 
>
> Key: HDFS-9538
> URL: https://issues.apache.org/jira/browse/HDFS-9538
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9538.HDFS-8707.003.patch, 
> HDFS-9538.HDFS-9537.000.patch, HDFS-9538.HDFS-9537.001.patch, 
> HDFS-9538.HDFS-9537.002.patch
>
>
> One goal of the Configuration classes are to allow the consumers of the 
> libhdfs++ library to deploy client applications into hadoop edge nodes and 
> have them pick up the Hadoop configuration that has been deployed there.
> Note that we also need to support the use case where the consumer application 
> will manage Hadoop configuration files itself, or will handle all 
> configuration out-of-band.
> libhdfs++ should be able to read files that are found in the field and easily 
> construct an instance that will communicate with the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9556) libhdfs++: allow connection to defaultFS from configuration

2015-12-15 Thread Bob Hansen (JIRA)

Bob Hansen created HDFS-9556:


 Summary: libhdfs++: allow connection to defaultFS from 
configuration
 Key: HDFS-9556
 URL: https://issues.apache.org/jira/browse/HDFS-9556
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Bob Hansen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8707) Implement an async pure c++ HDFS client

2015-12-15 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058279#comment-15058279
 ] 

James Clampffer commented on HDFS-8707:
---

No problem, thanks for helping out!

> Implement an async pure c++ HDFS client
> ---
>
> Key: HDFS-8707
> URL: https://issues.apache.org/jira/browse/HDFS-8707
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Owen O'Malley
>Assignee: Xiaobing Zhou
>
> As part of working on the C++ ORC reader at ORC-3, we need an HDFS pure C++ 
> client that lets us do async io to HDFS. We want to start from the code that 
> Haohui's been working on at https://github.com/haohui/libhdfspp .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058264#comment-15058264
 ] 

Hadoop QA commented on HDFS-8578:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs (total was 558, now 543). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 37s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 38s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 56 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 187m 39s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.TestAclsEndToEnd |
|   | hadoop.tracing.TestTracing |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdf

[jira] [Commented] (HDFS-8707) Implement an async pure c++ HDFS client

2015-12-15 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058290#comment-15058290
 ] 

James Clampffer commented on HDFS-8707:
---

This and the one above were supposed to end up in HDFS-9448.  No clue how they 
ended up here..

> Implement an async pure c++ HDFS client
> ---
>
> Key: HDFS-8707
> URL: https://issues.apache.org/jira/browse/HDFS-8707
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Owen O'Malley
>Assignee: Xiaobing Zhou
>
> As part of working on the C++ ORC reader at ORC-3, we need an HDFS pure C++ 
> client that lets us do async io to HDFS. We want to start from the code that 
> Haohui's been working on at https://github.com/haohui/libhdfspp .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8707) Implement an async pure c++ HDFS client

2015-12-15 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058280#comment-15058280
 ] 

James Clampffer commented on HDFS-8707:
---

No problem, thanks for helping out!

> Implement an async pure c++ HDFS client
> ---
>
> Key: HDFS-8707
> URL: https://issues.apache.org/jira/browse/HDFS-8707
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Owen O'Malley
>Assignee: Xiaobing Zhou
>
> As part of working on the C++ ORC reader at ORC-3, we need an HDFS pure C++ 
> client that lets us do async io to HDFS. We want to start from the code that 
> Haohui's been working on at https://github.com/haohui/libhdfspp .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9533) seen_txid in the shared edits directory is modified during bootstrapping

2015-12-15 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058283#comment-15058283
 ] 

Kihwal Lee commented on HDFS-9533:
--

The test failures are not related to this patch. Besides there is no 
intersection between the sets of failed tests in jdk7 and jdk8.


> seen_txid in the shared edits directory is modified during bootstrapping
> 
>
> Key: HDFS-9533
> URL: https://issues.apache.org/jira/browse/HDFS-9533
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-9533.patch
>
>
> The last known transaction id is stored in the seen_txid file of all known 
> directories of a NNStorage when starting a new edit segment. However, we have 
> seen a case where it contains an id that falls in the middle of an edit 
> segment. This was the seen_txid file in the sahred edits directory.  The 
> active namenode's local storage was containing valid looking seen_txid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058284#comment-15058284
 ] 

Allen Wittenauer edited comment on HDFS-8578 at 12/15/15 4:38 PM:
--

> 458m 22s 

>From https://builds.apache.org/job/PreCommit-HDFS-Build/13854/console:

{code}
Build timed out (after 300 minutes). Marking the build as aborted.
{code}

The test run took too long.  Jenkins thinks it is killing it before it 
completes and therefore grabs partial output. But because it is wrapped in a 
docker container, Yetus still finishes.  There's a chance of potentially 
merging the output of multiple runs, but I'd have to do a lot more research on 
that. 


was (Author: aw):
> 458m 22s 

The test run took too long.  Jenkins thinks it is killing it before it 
completes and therefore grabs partial output. But because it is wrapped in a 
docker container, Yetus still finishes.  There's a chance of potentially 
merging the output of multiple runs, but I'd have to do a lot more research on 
that. 

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, 
> HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, 
> HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, 
> HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, 
> h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, 
> h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058285#comment-15058285
 ] 

James Clampffer commented on HDFS-9448:
---

Awesome, thanks for the help!


> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9525) hadoop utilities need to support provided delegation tokens

2015-12-15 Thread HeeSoo Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HeeSoo Kim updated HDFS-9525:
-
Status: Patch Available  (was: Reopened)

> hadoop utilities need to support provided delegation tokens
> ---
>
> Key: HDFS-9525
> URL: https://issues.apache.org/jira/browse/HDFS-9525
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, 
> HDFS-7984.003.patch, HDFS-7984.004.patch, HDFS-7984.005.patch, 
> HDFS-7984.006.patch, HDFS-7984.007.patch, HDFS-7984.patch, HDFS-9525.008.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058284#comment-15058284
 ] 

Allen Wittenauer edited comment on HDFS-8578 at 12/15/15 4:39 PM:
--

> 458m 22s 

>From https://builds.apache.org/job/PreCommit-HDFS-Build/13854/console:

{code}
Build timed out (after 300 minutes). Marking the build as aborted.
{code}

The test run took too long.  Jenkins thinks it is killing it before it 
completes and therefore grabs partial output. But because it is wrapped in a 
docker container, Yetus still finishes.  There's a chance of potentially 
merging the output of multiple runs, but I'd have to do a lot more research on 
that. 

One other thing: testreport on Jenkins will only show the *last* JDK executed, 
not all of them.  So in this case, JDK8 errors are only visible via what Yetus 
reports back.


was (Author: aw):
> 458m 22s 

>From https://builds.apache.org/job/PreCommit-HDFS-Build/13854/console:

{code}
Build timed out (after 300 minutes). Marking the build as aborted.
{code}

The test run took too long.  Jenkins thinks it is killing it before it 
completes and therefore grabs partial output. But because it is wrapped in a 
docker container, Yetus still finishes.  There's a chance of potentially 
merging the output of multiple runs, but I'd have to do a lot more research on 
that. 

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, 
> HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, 
> HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, 
> HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, 
> h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, 
> h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9522) Cleanup o.a.h.hdfs.protocol.SnapshotDiffReport$DiffReportEntry

2015-12-15 Thread John Zhuge (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058317#comment-15058317
 ] 

John Zhuge commented on HDFS-9522:
--

There is a bug in DiffReportEntry.hashCode(). According to [Implementing 
hashCode|http://www.javapractices.com/topic/TopicAction.do?Id=28]:
* if a class overrides equals, it must override hashCode
* when they are both overridden, equals and hashCode must use the same set of 
fields

DiffReportEntry.equals() uses field "type", but DiffReportEntry.hashCode() 
doesn't.

> Cleanup o.a.h.hdfs.protocol.SnapshotDiffReport$DiffReportEntry
> --
>
> Key: HDFS-9522
> URL: https://issues.apache.org/jira/browse/HDFS-9522
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The current DiffReportEntry is a C-style tagged union-like data structure.  
> Recommend subclass hierarchy as in Java idiom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8402) Fsck exit codes are not reliable

2015-12-15 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058286#comment-15058286
 ] 

Kihwal Lee commented on HDFS-8402:
--

The changes looks good, but it does not apply to trunk anymore. Can you refresh 
the patch?

> Fsck exit codes are not reliable
> 
>
> Key: HDFS-8402
> URL: https://issues.apache.org/jira/browse/HDFS-8402
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-8402.patch
>
>
> HDFS-6663 added the ability to check specific blocks.  The exit code is 
> non-deterministically based on the state (corrupt, healthy, etc) of the last 
> displayed block's last storage location - instead of whether any of the 
> checked blocks' storages are corrupt.  Blocks with decommissioning or 
> decommissioned nodes should not be flagged as an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9524) libhdfs++ deadlocks in Filesystem::New if NN conneciton fails

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058318#comment-15058318
 ] 

Hadoop QA commented on HDFS-9524:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
17s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 20s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 17s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 54s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 46s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m 45s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/1268/HDFS-9524.HDFS-8707.002.patch
 |
| JIRA Issue | HDFS-9524 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 65faaea3e6b6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / 4a0eea7 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13878/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_66.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13878/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_91.txt
 |
| JDK v1.7.0_91  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13878/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Max memory used | 75MB |
| Powered by | Apache Yetus 0.1.0   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13878/console |


This message was automatically generated.



> libhdfs++ deadlocks in Filesystem::New if NN con

[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-15 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058325#comment-15058325
 ] 

Rakesh R commented on HDFS-9494:


Thanks [~demongaorui]. Overall the latest patch looks good apart from one 
comment. I'm adding a thought about the new exception handling logic, sorry for 
not identifying this point in my previous review.

With the proposed changes, it looks like not catching per streamer failure and 
handling that streamer. Instead it is throwing InterruptedIOException OR just 
warning. Is that intentionally implemented?
{code}
+  try {
+executorCompletionService.take().get();
+  } catch (InterruptedException ie) {
+throw DFSUtilClient.toInterruptedIOException(
+"Interrupted during waiting all streamer flush, ", ie);
+  } catch (ExecutionException ee) {
+LOG.warn(
+"Caught ExecutionException while waiting all streamer flush, ", 
ee);
+  }
{code}
Existing logic:

1# Iterate each streamer.
2# If there is a failure in the current streamer then 
{{handleStreamerFailure("flushInternal " + s, e);}}

New logic:

1# iterate each streamer
2# flush it and not waiting for the ack
3# Wait for any streamer completion. If one streamer failed then throwing 
IOException back to the caller OR just warn then continue.

Here in 3# step, I think it would be good if we can handle each streamer 
exception separately. For this, one draft idea that comes in mind is to move 
the exception handling logic inside {{Callable#call()}} rather than throwing 
the IOException of this function {{s.waitForAckedSeqno(toWaitFor);}}. Also, you 
need to refactor {{handleStreamerFailure()}} function by passing extra 
parameter of {{StripedDataStreamer s}} like,

{code}
public Void call() throws Exception {
   try {
  s.waitForAckedSeqno(toWaitFor);
   catch(IOException ioe){
   handleStreamerFailure(s, "flushInternal ", ioe);
   }
   return null;
}
{code}

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058284#comment-15058284
 ] 

Allen Wittenauer commented on HDFS-8578:


> 458m 22s 

The test run took too long.  Jenkins thinks it is killing it before it 
completes and therefore grabs partial output. But because it is wrapped in a 
docker container, Yetus still finishes.  There's a chance of potentially 
merging the output of multiple runs, but I'd have to do a lot more research on 
that. 

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, 
> HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, 
> HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, 
> HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, 
> h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, 
> h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN

2015-12-15 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058467#comment-15058467
 ] 

Uma Maheswara Rao G commented on HDFS-9198:
---

Hi Daryn, Apologies for the delay on this. Let me finish it today my final pass 
on this. Thanks.

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058494#comment-15058494
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8578:
---

The failed tests in the previous build do not seem related to the patch.

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, 
> HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, 
> HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, 
> HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, 
> h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, 
> h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-12-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058498#comment-15058498
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8578:
---

[~aw], thanks a lot for explaining it. I also appreciate you effort on 
improving Jenkins!

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, 
> HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, 
> HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, 
> HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, 
> h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, 
> h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9371) Code cleanup for DatanodeManager

2015-12-15 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9371:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for the review, Nicholas and 
Haohui!

> Code cleanup for DatanodeManager
> 
>
> Key: HDFS-9371
> URL: https://issues.apache.org/jira/browse/HDFS-9371
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.9.0
>
> Attachments: HDFS-9371.000.patch, HDFS-9371.001.patch, 
> HDFS-9371.002.patch, HDFS-9371.003.patch, HDFS-9371.004.patch
>
>
> Some code cleanup for DatanodeManager. The main changes include:
> # make the synchronization of {{datanodeMap}} and 
> {{datanodesSoftwareVersions}} consistent
> # remove unnecessary lock in {{handleHeartbeat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-15 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058566#comment-15058566
 ] 

Chris Trezzo commented on HDFS-8791:


Thanks [~kihwal] for the info! For the sake of completeness, we also have this 
patch deployed on a busy multi-thousand node cluster. Our upgrade was from a 
pre-id-based layout to the 32x32 layout, so we did not use an additional tool.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz, 
> test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9371) Code cleanup for DatanodeManager

2015-12-15 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058552#comment-15058552
 ] 

Hudson commented on HDFS-9371:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8970 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8970/])
HDFS-9371. Code cleanup for DatanodeManager. Contributed by Jing Zhao. (jing9: 
rev 8602692338d6f493647205e0241e4116211fab75)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


> Code cleanup for DatanodeManager
> 
>
> Key: HDFS-9371
> URL: https://issues.apache.org/jira/browse/HDFS-9371
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.9.0
>
> Attachments: HDFS-9371.000.patch, HDFS-9371.001.patch, 
> HDFS-9371.002.patch, HDFS-9371.003.patch, HDFS-9371.004.patch
>
>
> Some code cleanup for DatanodeManager. The main changes include:
> # make the synchronization of {{datanodeMap}} and 
> {{datanodesSoftwareVersions}} consistent
> # remove unnecessary lock in {{handleHeartbeat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9558) Replication requests always blames the source datanode in case of Checksum Exception.

2015-12-15 Thread Rushabh S Shah (JIRA)

Rushabh S Shah created HDFS-9558:


 Summary: Replication requests always blames the source datanode in 
case of Checksum Exception.
 Key: HDFS-9558
 URL: https://issues.apache.org/jira/browse/HDFS-9558
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Rushabh S Shah


Replication requests from datanode (in case of rack failure event) always 
blames the source datanode if any of the downstream nodes encounters 
ChecksumException.
We saw this case recently in our cluster.
We lost  7 nodes in a rack.
There was only one replica of the block (say on dnA).
The namenode asks dnA to replicate to dnB and dnC.
{noformat}
2015-12-13 21:09:41,798 [DataNode:   heartbeating to NN:8020] INFO 
datanode.DataNode: DatanodeRegistration(dnA, 
datanodeUuid=bc1f183d-b74a-49c9-ab1a-d1d496ab77e9, infoPort=1006, 
infoSecurePort=0, ipcPort=8020, 
storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571)
 Starting thread to transfer 
BP-1620678153--1351096255769:blk_3065507810_1107476861617 to dnB:1004 
dnC:1004 
{noformat}

All the packets going out from dnB's interface were getting corrupted.
So dnC  received corrupt block and it reported bad block (from dnA) to namenode.
Following are the logs from dnC:
{noformat}
2015-12-13 21:09:43,444 [DataXceiver for client  at /dnB:34879 [Receiving block 
BP-1620678153--1351096255769:blk_3065507810_1107476861617]] WARN 
datanode.DataNode: Checksum error in block 
BP-1620678153--1351096255769:blk_3065507810_1107476861617 from /dnB:34879
org.apache.hadoop.fs.ChecksumException: Checksum error:  at 58368 exp: 
-1657951272 got: 856104973
at 
org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
Method)
at 
org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:416)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:550)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:853)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:761)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:237)
at java.lang.Thread.run(Thread.java:745)
2015-12-13 21:09:43,445 [DataXceiver for client  at dnB:34879 [Receiving block 
BP-1620678153--1351096255769:blk_3065507810_1107476861617]] INFO 
datanode.DataNode: report corrupt 
BP-1620678153--1351096255769:blk_3065507810_1107476861617 from datanode 
dnA:1004 to namenode
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Daryn Sharp (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-9557:
--
Attachment: HDFS-9557.patch

very simple patch

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Daryn Sharp (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-9557:
--
Status: Patch Available  (was: Open)

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Daryn Sharp (JIRA)

Daryn Sharp created HDFS-9557:
-

 Summary: Reduce object allocation in PB conversion
 Key: HDFS-9557
 URL: https://issues.apache.org/jira/browse/HDFS-9557
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


PB conversions use {{ByteString.copyFrom}} to populate the builder.  
Unfortunately this creates unique instances for empty arrays instead of 
returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9558) Replication requests from datanode always blames the source datanode in case of Checksum Exception.

2015-12-15 Thread Rushabh S Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-9558:
-
Summary: Replication requests from datanode always blames the source 
datanode in case of Checksum Exception.  (was: Replication requests always 
blames the source datanode in case of Checksum Exception.)

> Replication requests from datanode always blames the source datanode in case 
> of Checksum Exception.
> ---
>
> Key: HDFS-9558
> URL: https://issues.apache.org/jira/browse/HDFS-9558
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Rushabh S Shah
>
> Replication requests from datanode (in case of rack failure event) always 
> blames the source datanode if any of the downstream nodes encounters 
> ChecksumException.
> We saw this case recently in our cluster.
> We lost  7 nodes in a rack.
> There was only one replica of the block (say on dnA).
> The namenode asks dnA to replicate to dnB and dnC.
> {noformat}
> 2015-12-13 21:09:41,798 [DataNode:   heartbeating to NN:8020] INFO 
> datanode.DataNode: DatanodeRegistration(dnA, 
> datanodeUuid=bc1f183d-b74a-49c9-ab1a-d1d496ab77e9, infoPort=1006, 
> infoSecurePort=0, ipcPort=8020, 
> storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571)
>  Starting thread to transfer 
> BP-1620678153--1351096255769:blk_3065507810_1107476861617 to dnB:1004 
> dnC:1004 
> {noformat}
> All the packets going out from dnB's interface were getting corrupted.
> So dnC  received corrupt block and it reported bad block (from dnA) to 
> namenode.
> Following are the logs from dnC:
> {noformat}
> 2015-12-13 21:09:43,444 [DataXceiver for client  at /dnB:34879 [Receiving 
> block BP-1620678153--1351096255769:blk_3065507810_1107476861617]] WARN 
> datanode.DataNode: Checksum error in block 
> BP-1620678153--1351096255769:blk_3065507810_1107476861617 from /dnB:34879
> org.apache.hadoop.fs.ChecksumException: Checksum error:  at 58368 exp: 
> -1657951272 got: 856104973
> at 
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
> Method)
> at 
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
> at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:416)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:550)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:853)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:761)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:237)
> at java.lang.Thread.run(Thread.java:745)
> 2015-12-13 21:09:43,445 [DataXceiver for client  at dnB:34879 [Receiving 
> block BP-1620678153--1351096255769:blk_3065507810_1107476861617]] INFO 
> datanode.DataNode: report corrupt 
> BP-1620678153--1351096255769:blk_3065507810_1107476861617 from datanode 
> dnA:1004 to namenode
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058610#comment-15058610
 ] 

Chris Nauroth commented on HDFS-9557:
-

+1 for the idea, but I see an infinite recursion here:

{code}
   public static ByteString getByteString(byte[] bytes) {
// return singleton to reduce object allocation
return (bytes.length == 0) ? ByteString.EMPTY : getByteString(bytes); // 
<-- stack overflow for non-empty bytes
   }
{code}


> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058611#comment-15058611
 ] 

Colin Patrick McCabe commented on HDFS-9557:


+1 pending jenkins.

Long-term, I think we might want to switch to a PB library that does less 
object allocation.

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058637#comment-15058637
 ] 

Mingliang Liu commented on HDFS-9557:
-

Does the general case fall into {{ByteString.copyFrom}}?

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9525) hadoop utilities need to support provided delegation tokens

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058656#comment-15058656
 ] 

Hadoop QA commented on HDFS-9525:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
1s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 48s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 17m 59s 
{color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new 
issues (was 729, now 729). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 26m 29s 
{color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new 
issues (was 723, now 723). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 57s 
{color} | {color:red} Patch generated 1 new checkstyle issues in root (total 
was 345, now 346). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 31s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 4s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 3s {color} | 
{color:red} hadoop-common in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:gr

[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-15 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058657#comment-15058657
 ] 

Jing Zhao commented on HDFS-9494:
-

I agree with [~rakeshr]'s comment. Exceptions can happen in streamers after 
queueing the packet and while sending the remaining packets out. The exception 
can be thrown by {{waitForAckedSeqno}}. We should correctly handle these 
exceptions instead of only logging a warning msg.

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9523) libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail

2015-12-15 Thread Bob Hansen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9523:
-
Attachment: HDFS-9523.HDFS-8707.test1.patch

> libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail
> ---
>
> Key: HDFS-9523
> URL: https://issues.apache.org/jira/browse/HDFS-9523
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9523.HDFS-8707.000.patch, 
> HDFS-9523.HDFS-8707.test1.patch, failed_docker_run.txt
>
>
> When run under Docker, libhdfs++ is not connecting to the mini DFS cluster.   
> This is the reason the CI tests have been failing in the 
> libhdfs_threaded_hdfspp_test_shim_static test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058611#comment-15058611
 ] 

Colin Patrick McCabe edited comment on HDFS-9557 at 12/15/15 8:16 PM:
--

+1 for the idea, pending fixing the issue [~cnauroth] found.

Long-term, I think we might want to switch to a PB library that does less 
object allocation.


was (Author: cmccabe):
+1 pending jenkins.

Long-term, I think we might want to switch to a PB library that does less 
object allocation.

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-15 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058718#comment-15058718
 ] 

Colin Patrick McCabe commented on HDFS-8562:


That's frustrating.  Unfortunately, I don't think there is any file-related 
class in standard Java which can be created via a {{FileDescriptor}} without 
using private APIs-- except {{FileInputStream}}.  {{FileChannel#open}} only 
takes paths.  {{RandomAccessFile}}'s constructors only take paths as well.  One 
approach would be to try to use the private APIs via reflection, and then fall 
back to creating a {{FileInputStream}} if that's not possible.  We can get a 
{{FileChannel}} from the {{FileInputStream}}.  In order to do that, we'd have 
to hold around a reference to the {{FileInputStream}} somewhere, to prevent its 
finalizer from kicking in and closing the {{FileDescriptor}}.

> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap, length);
> 16.   } finally {
> 17.   IOUtils.closeQuietly(blockChannel);
> 18.   if (mappableBlock == null) {
> 19.   if (mmap != null) {
> 20.   NativeIO.POSIX.munmap(mmap); // unmapping also unlocks
> 21.   }
> 22.   }
> 23.   }
> 24.   return mappableBlock;
> 25.   }
> {code}
> We looked up 
> https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html  and
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java
>  and noticed FileInputStream relies on the Finalizer to release its resource. 
> When a class that has a finalizer created, an entry for that class instance 
> is put on a queue in the JVM so the JVM knows it has a finalizer that needs 
> to be executed.   
> The current issue is: even with programmers do call close() after using 
> FileInputStream, its finalize() method will still be called. In other words, 
> still get the side effect of the FinalReference being registered at 
> FileInputStream allocation time, and also reference processing to reclaim the 
> FinalReference during GC (any GC solution has to deal with this). 
> We can imagine When running industry deployment HDFS, millions of files could 
> be opened and closed which resulted in a very large number of finalizers 
> being register

[jira] [Updated] (HDFS-9541) Add hdfsStreamBuilder API to libhdfs to support defaultBlockSizes greater than 2 GB

2015-12-15 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9541:
---
Status: In Progress  (was: Patch Available)

> Add hdfsStreamBuilder API to libhdfs to support defaultBlockSizes greater 
> than 2 GB
> ---
>
> Key: HDFS-9541
> URL: https://issues.apache.org/jira/browse/HDFS-9541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Affects Versions: 0.20.1
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-9541.001.patch
>
>
> We should have a new API in libhdfs which will support creating files with a 
> default block size that is more than 31 bits in size.  We should also make 
> this a builder API so that it is easy to add more options later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9038) DFS reserved space is erroneously counted towards non-DFS used.

2015-12-15 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058764#comment-15058764
 ] 

Chris Nauroth commented on HDFS-9038:
-

Thanks everyone for sticking with this.  This has turned out to be much 
trickier than I anticipated when I filed the issue.  I'd like to summarize 
current status.

Arpit and I are in agreement about my analysis of how the calculation changed 
after HDFS-5215.  However, we are not yet in agreement about which calculation 
is truly correct.  I believe the pre-HDFS-5215 calculation (subtracting 
{{dfs.datanode.du.reserved}}) is correct, because it allowed me to monitor for 
unexpected non-zero non-DFS usage and react.  Since this was an established 
operations workflow (at least for me), I argue that we have a responsibility to 
restore that behavior.  Arpit believes that it's correct to cancel out 
{{dfs.datanode.du.reserved}}, because then non-DFS used would report space used 
for non-HDFS purposes more accurately.  Essentially, it's a question of whether 
this metric means "Raw Non-DFS Used" or "Unplanned Non-DFS Used".

We also discovered an interesting side issue about {{File#getUsableSpace}} vs. 
{{File#getFreeSpace}}.  Pre-HDFS-5215, it could be considered a bug that we did 
not account for system reserved space.  Interestingly, it seems in our testing 
that ext holds back 5% by default, but xfs does not.

I pushed pretty hard for restoring the pre-HDFS-5215 behavior in my earlier 
comments, but I'm just one voice.  I suggest that we leave this issue open for 
a while for others to comment.  I could be swayed if others think I'm 
approaching this incorrectly.  Meanwhile, [~brahmareddy], would you please hold 
off on posting more patches?  Let's wait for the discussion to settle a little 
more first.  Thanks for your patience.

> DFS reserved space is erroneously counted towards non-DFS used.
> ---
>
> Key: HDFS-9038
> URL: https://issues.apache.org/jira/browse/HDFS-9038
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038-006.patch, 
> HDFS-9038-007.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9520) PeerCache evicts too frequently causing connection restablishments

2015-12-15 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058753#comment-15058753
 ] 

Colin Patrick McCabe commented on HDFS-9520:


Right.  {{dfs.client.socketcache.capacity}} is the capacity of the socket 
cache, not the number of distinct datanodes it holds.

The right size for {{dfs.client.socketcache.capacity}} depends on a few things. 
 The more HDFS input streams you have open at once, the more sockets you will 
use at once.  The more datanodes you have in your cluster, the larger you may 
want the cache to be, so that you get a better hit rate.

We could certainly raise the default value for this.  Probably it should be at 
least 32 or 64.

> PeerCache evicts too frequently causing connection restablishments
> --
>
> Key: HDFS-9520
> URL: https://issues.apache.org/jira/browse/HDFS-9520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: HDFS-9520.png
>
>
> Env: 20 node setup
> dfs.client.socketcache.capacity = 16
> Issue:
> ==
> Monitored PeerCache and it was evicting lots of connections during close. Set 
> "dfs.client.socketcache.capacity=20" and tested again. Evictions still 
> happened. Screenshot of profiler is attached in the JIRA.
> Workaround:
> ===
> Temp fix was to set "dfs.client.socketcache.capacity=1000" to prevent 
> eviction. 
> Added more debug logs revealed that multimap.size() was 40 instead of 20. 
> LinkedListMultimap returns the total values instead of key size causing lots 
> of evictions.
> {code}
>if (capacity == multimap.size()) {
>   evictOldest();
> }
> {code}
> Should this be (capacity == multimap.keySet().size())  or is it expected that 
> the "dfs.client.socketcache.capacity" be set to very high value?
> \cc [~gopalv], [~sseth]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9523) libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058763#comment-15058763
 ] 

Hadoop QA commented on HDFS-9523:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
10s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 17s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 21s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 22m 22s {color} | 
{color:red} hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 5 new issues (was 3, now 8). {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 26m 51s {color} | 
{color:red} hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 5 new issues (was 3, now 8). {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 51s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 48s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 16s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777813/HDFS-9523.HDFS-8707.test1.patch
 |
| J

[jira] [Commented] (HDFS-9038) DFS reserved space is erroneously counted towards non-DFS used.

2015-12-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058805#comment-15058805
 ] 

Allen Wittenauer commented on HDFS-9038:


I'd really like for someone to attach a simple test case so that we can run and 
see the differences on different file systems.  In particular, every time we 
change this code, we only ever test on one or two Linux file systems and end up 
causing really oddball behaviors on others, especially those that don't use the 
traditional disk->partition->filesystem layout (e.g., ZFS).  

> DFS reserved space is erroneously counted towards non-DFS used.
> ---
>
> Key: HDFS-9038
> URL: https://issues.apache.org/jira/browse/HDFS-9038
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038-006.patch, 
> HDFS-9038-007.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"

2015-12-15 Thread Aaron T. Myers (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058832#comment-15058832
 ] 

Aaron T. Myers commented on HDFS-6804:
--

[~max_datapath] - what version of Hadoop did you repro this on?

> race condition between transferring block and appending block causes 
> "Unexpected checksum mismatch exception" 
> --
>
> Key: HDFS-6804
> URL: https://issues.apache.org/jira/browse/HDFS-6804
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Gordon Wang
>
> We found some error log in the datanode. like this
> {noformat}
> 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ex
> ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from 
> /192.168.2.101:39495
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> While on the source datanode, the log says the block is transmitted.
> {noformat}
> 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Da
> taTransfer: Transmitted 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
> _9248 (numBytes=16188152) to /192.168.2.103:50010
> {noformat}
> When the destination datanode gets the checksum mismatch, it reports bad 
> block to NameNode and NameNode marks the replica on the source datanode as 
> corrupt. But actually, the replica on the source datanode is valid. Because 
> the replica can pass the checksum verification.
> In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9502) DiskBalancer : Replace Node and Data Density with Weighted Mean and Variance

2015-12-15 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9502:
---
Attachment: HDFS-9502-HDFS-1312.001.patch

This patch changes VolumeDataDensity to VolumeWeightedVariance using the 
algorithm suggested by [~szetszwo].

[~szetszwo], Please note that I have left NodeDataDensity as is since it is 
only used by command line tool to print out which nodes need users attention.

The real balancing computation uses the VolumeWeightedMean and 
VolumeWeightedVariance.

This patch is dependent on HDFS-9469, hence not submitting for the Jenkins run 
now. I will do it after HDFS-9469 is committed. I am uploading the patch since 
it is ready for code review.
 

> DiskBalancer : Replace Node and Data Density with Weighted Mean and Variance
> 
>
> Key: HDFS-9502
> URL: https://issues.apache.org/jira/browse/HDFS-9502
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9502-HDFS-1312.001.patch
>
>
> We use notions called Data Density which are based are similar to weighted 
> mean and variance. Make sure that computations map directly to these concepts 
> since it is easier to understand them than the density as defined in Disk 
> Balancer now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9469) DiskBalancer : Add Planner

2015-12-15 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058854#comment-15058854
 ] 

Anu Engineer commented on HDFS-9469:


I have just updated the patch HDFS-9502. I will submit it for jenkins run after 
this patch is committed.


> DiskBalancer : Add Planner 
> ---
>
> Key: HDFS-9469
> URL: https://issues.apache.org/jira/browse/HDFS-9469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9469-HDFS-1312.001.patch, 
> HDFS-9469-HDFS-1312.002.patch, HDFS-9469-HDFS-1312.003.patch
>
>
> Disk Balancer reads the cluster data and then creates a plan for the data 
> moves based on the snap-shot of the data read from the nodes. This plan is 
> later submitted to data nodes for execution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9038) DFS reserved space is erroneously counted towards non-DFS used.

2015-12-15 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9038:

Attachment: GetFree.java

tl; dr - We need agreement on the definition of non-DFS used.

IMO the original Jira description is inaccurate. The pre HDFS-5215 calculation 
had two bugs.
# It incorrectly subtracted reserved space from the non-DFS used. (net 
negative). Chris suggests this is not really an issue as non-DFS used should be 
shown as zero unless it exceeds the DFS reserved value.
# It used File#getUsableSpace to calculate the volume free space instead of 
File#getFreeSpace. (net positive)

The net effect was that non-DFS used was displayed as zero unless the actual 
non-DFS used exceeded {{DFS reserved - system reserved}}.

HDFS-5215 fixed the first issue and the value that is now erroneously counted 
towards non-DFS used is in fact the system reserved 5%. 

Also attached a trivial utility that dumps the free/available space.

>From a mostly empty 40GB Ext4 partition:
{code}
$ java GetFree /mnt/sdb/hadoop/
Free space   : 42,090,229,760
Available space  : 39,925,968,896
{code}

Same partition reformatted as XFS:
{code}
Free space   : 42,894,983,168
Available space  : 42,894,983,168
{code}

So Ext derivatives hold back 5% free space while XFS does not.


> DFS reserved space is erroneously counted towards non-DFS used.
> ---
>
> Key: HDFS-9038
> URL: https://issues.apache.org/jira/browse/HDFS-9038
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
> Attachments: GetFree.java, HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038-006.patch, 
> HDFS-9038-007.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9523) libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail

2015-12-15 Thread Bob Hansen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9523:
-
Attachment: HDFS-9523.HDFS-8707.001.patch

New patch: explicitly close failed connections before attempting to reconnect 
them.  Older kernels would choke up if there was a file handle there.

> libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail
> ---
>
> Key: HDFS-9523
> URL: https://issues.apache.org/jira/browse/HDFS-9523
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9523.HDFS-8707.000.patch, 
> HDFS-9523.HDFS-8707.001.patch, HDFS-9523.HDFS-8707.test1.patch, 
> failed_docker_run.txt
>
>
> When run under Docker, libhdfs++ is not connecting to the mini DFS cluster.   
> This is the reason the CI tests have been failing in the 
> libhdfs_threaded_hdfspp_test_shim_static test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-9038) DFS reserved space is erroneously counted towards non-DFS used.

2015-12-15 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058858#comment-15058858
 ] 

Arpit Agarwal edited comment on HDFS-9038 at 12/15/15 9:34 PM:
---

tl; dr - We need agreement on the definition of non-DFS used.

The pre HDFS-5215 calculation had two bugs.
# It incorrectly subtracted reserved space from the non-DFS used. (net 
negative). Chris suggests this is not really an issue as non-DFS used should be 
shown as zero unless it exceeds the DFS reserved value.
# It used File#getUsableSpace to calculate the volume free space instead of 
File#getFreeSpace. (net positive)

The net effect was that non-DFS used was displayed as zero unless the actual 
non-DFS used exceeded {{DFS reserved - system reserved}}.

HDFS-5215 fixed the first issue and the value that is now erroneously counted 
towards non-DFS used is in fact the system reserved 5%. 

Also attached a trivial utility that dumps the free/available space.

>From a mostly empty 40GB Ext4 partition:
{code}
$ java GetFree /mnt/sdb/hadoop/
Free space   : 42,090,229,760
Available space  : 39,925,968,896
{code}

Same partition reformatted as XFS:
{code}
Free space   : 42,894,983,168
Available space  : 42,894,983,168
{code}

So Ext derivatives hold back 5% free space while XFS does not.

Edit: Removed statement about Jira description being inaccurate since it too 
depends on how we define non-DFS used. :)


was (Author: arpitagarwal):
tl; dr - We need agreement on the definition of non-DFS used.

IMO the original Jira description is inaccurate. The pre HDFS-5215 calculation 
had two bugs.
# It incorrectly subtracted reserved space from the non-DFS used. (net 
negative). Chris suggests this is not really an issue as non-DFS used should be 
shown as zero unless it exceeds the DFS reserved value.
# It used File#getUsableSpace to calculate the volume free space instead of 
File#getFreeSpace. (net positive)

The net effect was that non-DFS used was displayed as zero unless the actual 
non-DFS used exceeded {{DFS reserved - system reserved}}.

HDFS-5215 fixed the first issue and the value that is now erroneously counted 
towards non-DFS used is in fact the system reserved 5%. 

Also attached a trivial utility that dumps the free/available space.

>From a mostly empty 40GB Ext4 partition:
{code}
$ java GetFree /mnt/sdb/hadoop/
Free space   : 42,090,229,760
Available space  : 39,925,968,896
{code}

Same partition reformatted as XFS:
{code}
Free space   : 42,894,983,168
Available space  : 42,894,983,168
{code}

So Ext derivatives hold back 5% free space while XFS does not.


> DFS reserved space is erroneously counted towards non-DFS used.
> ---
>
> Key: HDFS-9038
> URL: https://issues.apache.org/jira/browse/HDFS-9038
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
> Attachments: GetFree.java, HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038-006.patch, 
> HDFS-9038-007.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9487) libhdfs++ Enable builds with no compiler optimizations

2015-12-15 Thread Bob Hansen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9487:
-
Status: Patch Available  (was: Open)

> libhdfs++ Enable builds with no compiler optimizations
> --
>
> Key: HDFS-9487
> URL: https://issues.apache.org/jira/browse/HDFS-9487
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-9487.HDFS-8707.000.patch
>
>
> The default build configuration uses -02 -g .  To make 
> debugging easier it would be really nice to be able to produce builds with 
> -O0.
> I haven't found an existing flag to pass to maven or cmake to accomplish 
> this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1505#comment-1505
 ] 

Hadoop QA commented on HDFS-9557:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 8m 19s {color} 
| {color:red} hadoop-hdfs-project-jdk1.8.0_66 with JDK v1.8.0_66 generated 3 
new issues (was 48, now 48). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 9m 53s {color} 
| {color:red} hadoop-hdfs-project-jdk1.7.0_91 with JDK v1.7.0_91 generated 3 
new issues (was 50, now 50). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-hdfs-project (total was 91, now 91). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 3s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client introduced 1 new 
FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 34s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 48m 37s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} Patch genera

[jira] [Commented] (HDFS-9371) Code cleanup for DatanodeManager

2015-12-15 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058910#comment-15058910
 ] 

Hudson commented on HDFS-9371:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #695 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/695/])
HDFS-9371. Code cleanup for DatanodeManager. Contributed by Jing Zhao. (jing9: 
rev 8602692338d6f493647205e0241e4116211fab75)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java


> Code cleanup for DatanodeManager
> 
>
> Key: HDFS-9371
> URL: https://issues.apache.org/jira/browse/HDFS-9371
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.9.0
>
> Attachments: HDFS-9371.000.patch, HDFS-9371.001.patch, 
> HDFS-9371.002.patch, HDFS-9371.003.patch, HDFS-9371.004.patch
>
>
> Some code cleanup for DatanodeManager. The main changes include:
> # make the synchronization of {{datanodeMap}} and 
> {{datanodesSoftwareVersions}} consistent
> # remove unnecessary lock in {{handleHeartbeat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9523) libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058937#comment-15058937
 ] 

Hadoop QA commented on HDFS-9523:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
9s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 23s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 21s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 42s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 44s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m 21s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777830/HDFS-9523.HDFS-8707.001.patch
 |
| JIRA Issue | HDFS-9523 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux a02a36e79182 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / 01087b0 |
| JDK v1.7.0_91  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13883/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Max memory used | 76MB |
| Powered by | Apache Yetus 0.1.0   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13883/console |


This message was automatically generated.



> libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail
> ---
>
> Key: HDFS-9523
> URL: https://issues.apache.org/jira/browse/HDFS-9523
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>

[jira] [Commented] (HDFS-6054) MiniQJMHACluster should not use static port to avoid binding failure in unit test

2015-12-15 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058957#comment-15058957
 ] 

Zhe Zhang commented on HDFS-6054:
-

Thanks Yongjun. +1 pending a fresh Jenkins run.

> MiniQJMHACluster should not use static port to avoid binding failure in unit 
> test
> -
>
> Key: HDFS-6054
> URL: https://issues.apache.org/jira/browse/HDFS-6054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Brandon Li
>Assignee: Yongjun Zhang
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6054.001.patch, HDFS-6054.002.patch, 
> HDFS-6054.003.patch
>
>
> One example of the test failues: TestFailureToReadEdits
> {noformat}
> Error Message
> Port in use: localhost:10003
> Stacktrace
> java.net.BindException: Port in use: localhost:10003
>   at sun.nio.ch.Net.bind(Native Method)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:845)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:786)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:132)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:593)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:492)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:650)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:635)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1283)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:966)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:851)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:697)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:374)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:355)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits.setUpCluster(TestFailureToReadEdits.java:108)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058989#comment-15058989
 ] 

Allen Wittenauer commented on HDFS-9448:


OK, based upon the tests I've done, +1.  The -Drequire.valgrind works as 
expected when valgrind isn't present and the Docker file properly builds an 
environment that works.


> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-12-15 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058995#comment-15058995
 ] 

Zhe Zhang commented on HDFS-9173:
-

bq. Instead, we may also need to update the current patch to handle the 
possible failures from callInitReplicaRecovery.
I agree. That can be done separately though. The logic is not straightforward, 
will be something similar to {{errorCount}} as contiguous recovery tasks.

For {{newLocs}} and {{newStorages}}: I think the basic logic is to populate 
them when successfully {{callInitReplicaRecovery}} for an internal block. Their 
assignments should immediately follow {{callInitReplicaRecovery}}. Later when 
we move {{callInitReplicaRecovery}} into a try statement the assignments can 
stay there too. Not a major issue, just to avoid an n^2 search. We can change 
after error handling is added.

The patch needs another rebase.

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch, 
> HDFS-9173.05.patch, HDFS-9173.06.patch, HDFS-9173.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8020) Erasure Coding: restore BlockGroup and schema info from stripping coding command

2015-12-15 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059008#comment-15059008
 ] 

Zhe Zhang commented on HDFS-8020:
-

I think we can close this JIRA? {{ErasureCodingWorker}} can already get all 
recovery-related info.

> Erasure Coding: restore BlockGroup and schema info from stripping coding 
> command
> 
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> As a task of HDFS-7348, to process *stripping* coding commands from NameNode 
> or other scheduler services/tools, we need to first be able to restore 
> BlockGroup and schema information in DataNode, which will be used to 
> construct and perform coding work using {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9227) Check block checksums for integrity

2015-12-15 Thread James Clampffer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9227:
--
Attachment: HDFS-9227.HDFS-8707.000.patch

Initial checksum patch, still need to figure out how to do a CI test.

CRC implementations are copies of what the java client uses with a few small 
modifications.  Needed to get rid of the hardware accelerated implementations 
in order to do static linkage.  Those rely on a loader hook to patch a function 
pointer to the correct hardware implementation depending on system architecture.

> Check block checksums for integrity
> ---
>
> Key: HDFS-9227
> URL: https://issues.apache.org/jira/browse/HDFS-9227
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Attachments: HDFS-9227.HDFS-8707.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9227) Check block checksums for integrity

2015-12-15 Thread James Clampffer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9227:
--
Status: Patch Available  (was: Open)

> Check block checksums for integrity
> ---
>
> Key: HDFS-9227
> URL: https://issues.apache.org/jira/browse/HDFS-9227
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Attachments: HDFS-9227.HDFS-8707.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7691) Handle hflush and hsync in the best optimal way possible during online Erasure encoding

2015-12-15 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-7691.
-
Resolution: Duplicate

Thanks Vinay for confirming. Let's track the hflush related efforts under 
HDFS-7661.

> Handle hflush and hsync in the best optimal way possible during online 
> Erasure encoding
> ---
>
> Key: HDFS-7691
> URL: https://issues.apache.org/jira/browse/HDFS-7691
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>
> as mentioned in design doc, hsync and hflush tends to make the online erasure 
> encoding complex.
> But these are critical features to ensure fault tolerance for some users.
> These operations should be supported in best way possible during online 
> erasure encoding to support the fault tolerance.
> This Jira is a placeholder for the task. How to solve this will be discussed 
> later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7661) Erasure coding: support hflush and hsync

2015-12-15 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7661:

Summary: Erasure coding: support hflush and hsync  (was: Support read when 
a EC file is being written)

> Erasure coding: support hflush and hsync
> 
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8894) Set SO_KEEPALIVE on DN server sockets

2015-12-15 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-8894:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks for the contribution [~kanaka]! Committed to trunk, branch-2, branch-2.8 
for 2.8.0 release.

> Set SO_KEEPALIVE on DN server sockets
> -
>
> Key: HDFS-8894
> URL: https://issues.apache.org/jira/browse/HDFS-8894
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Kanaka Kumar Avvaru
> Fix For: 2.8.0
>
> Attachments: HDFS-8894-01.patch, HDFS-8894-01.patch, 
> HDFS-8894-02.patch, HDFS-8894-03.patch, HDFS-8894-04.patch
>
>
> SO_KEEPALIVE is not set on things like datastreamer sockets which can cause 
> lingering ESTABLISHED sockets when there is a network glitch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread Bob Hansen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059026#comment-15059026
 ] 

Bob Hansen commented on HDFS-9448:
--

[~aw] - thanks for helping me to find the right way to put it together for 
Hadoop, and for taking the time for multiple reviews and tests.  I appreciate 
it.

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059024#comment-15059024
 ] 

Daryn Sharp commented on HDFS-9557:
---

Ug, when one last search-n-replace goes horribly wrong...  Yes.  Will repost.

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059028#comment-15059028
 ] 

Daryn Sharp commented on HDFS-9557:
---

You'll soon be delighted to know I'll have a patch up this week that uses PB 
decoding in a far more efficient manner.

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-15 Thread Daryn Sharp (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-9557:
--
Attachment: HDFS-9557.patch

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9557.patch, HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059042#comment-15059042
 ] 

Kai Zheng commented on HDFS-8562:
-

Yeah, we may need some tweak anyway here. Can we obtain the path in addition to 
the fd, like passing it from DataNode if we do need it? It may be OK if we 
don't have it, because I guess the path parameter to call the method can be set 
null. Ref. the following codes from the source in OpenJDK 8:
{code}
// The path of the referenced file
// (null if the parent stream is created with a file descriptor)
private final String path;
{code}
Calling the method thru reflection is a good idea, the logic could be:
1) Attempting to call FileChannelImpl.open(fd, true, true, null); // JDK 7
2) Attempting to call FileChannelImpl.open(fd, null, true, true, null); // JDK 
8, path could be nullable
3) Fallback to FileInputStream



> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap, length);
> 16.   } finally {
> 17.   IOUtils.closeQuietly(blockChannel);
> 18.   if (mappableBlock == null) {
> 19.   if (mmap != null) {
> 20.   NativeIO.POSIX.munmap(mmap); // unmapping also unlocks
> 21.   }
> 22.   }
> 23.   }
> 24.   return mappableBlock;
> 25.   }
> {code}
> We looked up 
> https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html  and
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java
>  and noticed FileInputStream relies on the Finalizer to release its resource. 
> When a class that has a finalizer created, an entry for that class instance 
> is put on a queue in the JVM so the JVM knows it has a finalizer that needs 
> to be executed.   
> The current issue is: even with programmers do call close() after using 
> FileInputStream, its finalize() method will still be called. In other words, 
> still get the side effect of the FinalReference being registered at 
> FileInputStream allocation time, and also reference processing to reclaim the 
> FinalReference during GC (any GC solution has to deal with this). 
> We can imagine When running industry deployment HDFS, millions of files could 
> be opened and closed which resulted in a very large number of finali

[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059041#comment-15059041
 ] 

Allen Wittenauer commented on HDFS-9448:


No, thank you. :)  I'm actually really pleased to have valgrind tests for some 
of the native code and I'm hoping this spreads haha.

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8894) Set SO_KEEPALIVE on DN server sockets

2015-12-15 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059064#comment-15059064
 ] 

Hudson commented on HDFS-8894:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8971 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8971/])
HDFS-8894. Set SO_KEEPALIVE on DN server sockets. Contributed by Kanaka (wang: 
rev 49949a4bb03aa81cbb9115e91ab1c61cc6dc8a62)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Set SO_KEEPALIVE on DN server sockets
> -
>
> Key: HDFS-8894
> URL: https://issues.apache.org/jira/browse/HDFS-8894
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Kanaka Kumar Avvaru
> Fix For: 2.8.0
>
> Attachments: HDFS-8894-01.patch, HDFS-8894-01.patch, 
> HDFS-8894-02.patch, HDFS-8894-03.patch, HDFS-8894-04.patch
>
>
> SO_KEEPALIVE is not set on things like datastreamer sockets which can cause 
> lingering ESTABLISHED sockets when there is a network glitch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9487) libhdfs++ Enable builds with no compiler optimizations

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059075#comment-15059075
 ] 

Hadoop QA commented on HDFS-9487:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
42s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 7s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 57s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 42s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 17s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 
3s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 19s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777574/HDFS-9487.HDFS-8707.000.patch
 |
| JIRA Issue | HDFS-9487 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  cc  |
| uname | Linux 1cb7c05c672d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / 01087b0 |
| unit | 
https://builds.apache.org/job/PreCo

[jira] [Commented] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059094#comment-15059094
 ] 

Kai Zheng commented on HDFS-8562:
-

Hi [~ywang261], per above discussion, the blocking issue is, we need a public 
Java API to create a *FileChannel* via a *FileDescriptor* without file path. 
Could you make a feature request to Oracle for this? I thought it's doable 
because the path isn't essentially needed and we do have a strong requirement 
here. If we're lucky, it may be done in a Java update and we won't have to wait 
for it in next major version. Thanks!

> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap, length);
> 16.   } finally {
> 17.   IOUtils.closeQuietly(blockChannel);
> 18.   if (mappableBlock == null) {
> 19.   if (mmap != null) {
> 20.   NativeIO.POSIX.munmap(mmap); // unmapping also unlocks
> 21.   }
> 22.   }
> 23.   }
> 24.   return mappableBlock;
> 25.   }
> {code}
> We looked up 
> https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html  and
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java
>  and noticed FileInputStream relies on the Finalizer to release its resource. 
> When a class that has a finalizer created, an entry for that class instance 
> is put on a queue in the JVM so the JVM knows it has a finalizer that needs 
> to be executed.   
> The current issue is: even with programmers do call close() after using 
> FileInputStream, its finalize() method will still be called. In other words, 
> still get the side effect of the FinalReference being registered at 
> FileInputStream allocation time, and also reference processing to reclaim the 
> FinalReference during GC (any GC solution has to deal with this). 
> We can imagine When running industry deployment HDFS, millions of files could 
> be opened and closed which resulted in a very large number of finalizers 
> being registered and subsequently being executed.  That could cause very long 
> GC pause times.
> We tried to use Files.newInputStream() to replace FileInputStream, but it was 
> clear we could not replace FileInputStream in 
> hdfs/server/datanode/fsdataset/impl/MappableBlock.java 
> We notified

[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059109#comment-15059109
 ] 

James Clampffer commented on HDFS-9448:
---

Thanks for checking it out [~aw].

I've committed this to HDFS-8707.  Thanks for the patch [~bobthansen]!

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-15 Thread James Clampffer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9448:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch, 
> HDFS-9448.HDFS-8707.003.patch, HDFS-9448.HDFS-8707.004.patch, 
> HDFS-9448.HDFS-8707.005.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN

2015-12-15 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059188#comment-15059188
 ] 

Uma Maheswara Rao G commented on HDFS-9198:
---

[~daryn], Latest patch looks good to me. Could you please update patch on the 
latest trunk code. Seems it is failing to apply cleanly. Sorry for the in time 
review on this. Also it will get chance to run the jenkins on latest code base. 
Thanks

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9227) Check block checksums for integrity

2015-12-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059208#comment-15059208
 ] 

Hadoop QA commented on HDFS-9227:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
40s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 28s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 31s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 47m 3s {color} | 
{color:red} hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 2 new issues (was 3, now 5). {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 54m 31s {color} | 
{color:red} hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 2 new issues (was 3, now 5). {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1026 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 41s 
{color} | {color:red} The patch has 4 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 23s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 56s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
55s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 28s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777846/HDFS-9227.HDFS-8707.000.patch
 |
| JIRA Issue | HDFS-9227 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux fb1db9d536ae 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / 01087b0 |
| cc | hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_66: 
https://builds.apache.org/job/PreCommit-HDFS-Build/13887/artifact/patchprocess/diff-compile-cc-hadoop-hdfs-project_hadoop-hd

[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-12-15 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059237#comment-15059237
 ] 

Jing Zhao commented on HDFS-9173:
-

Spent some more time on the current patch as well as the 01 patch which 
includes step #3. Some thoughts:
# For step 3, the 01 patch directly appends data to internal blocks whose 
lengths are less than the safe length. This should be working, though we also 
need to handle cases like corrupted data. As mentioned by Walter, directly 
using the writeBlock protocol requires the replica in RBW state.
# In the meanwhile, we can also invent some new protocol for this kind of 
recovery, which should be able to handle RUR. A simpler approach can even be 
rewriting these internal blocks based on the safe length.
# Since we're actually not doing step 3 in this jira, I'm thinking that maybe 
we can do only one {{callInitReplicaRecovery}} call, and put all the internal 
blocks into RUR state. In this way we can simplify the current implementation 
especially the failure handling logic.

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch, 
> HDFS-9173.05.patch, HDFS-9173.06.patch, HDFS-9173.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8020) Erasure Coding: restore BlockGroup and schema info from stripping coding command

2015-12-15 Thread Kai Sasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059242#comment-15059242
 ] 

Kai Sasaki commented on HDFS-8020:
--

[~zhz] I agree with you. I also found Erasure coding command was already 
implemented. So we can close this.

> Erasure Coding: restore BlockGroup and schema info from stripping coding 
> command
> 
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> As a task of HDFS-7348, to process *stripping* coding commands from NameNode 
> or other scheduler services/tools, we need to first be able to restore 
> BlockGroup and schema information in DataNode, which will be used to 
> construct and perform coding work using {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8020) Erasure Coding: restore BlockGroup and schema info from stripping coding command

2015-12-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059308#comment-15059308
 ] 

Kai Zheng commented on HDFS-8020:
-

Hi Zhe and Kai, we may still need this. Please look at my above comment to Uma. 
Thanks.

> Erasure Coding: restore BlockGroup and schema info from stripping coding 
> command
> 
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> As a task of HDFS-7348, to process *stripping* coding commands from NameNode 
> or other scheduler services/tools, we need to first be able to restore 
> BlockGroup and schema information in DataNode, which will be used to 
> construct and perform coding work using {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8030) HDFS Erasure Coding Phase II -- EC with contiguous layout

2015-12-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059317#comment-15059317
 ] 

Kai Zheng commented on HDFS-8030:
-

Thanks [~zhz] for initiating phase II work. I guess we'll need a new branch to 
start the work? 

> HDFS Erasure Coding Phase II -- EC with contiguous layout
> -
>
> Key: HDFS-8030
> URL: https://issues.apache.org/jira/browse/HDFS-8030
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFSErasureCodingPhaseII-20151204.pdf
>
>
> Data redundancy form -- replication or erasure coding, should be orthogonal 
> to block layout -- contiguous or striped. This JIRA explores the combination 
> of {{Erasure Coding}} + {{Contiguous}} block layout.
> As will be detailed in the design document, key benefits include preserving 
> block locality, and easy conversion between hot and cold modes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8020) Erasure Coding: restore BlockGroup and schema info from stripping coding command

2015-12-15 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059330#comment-15059330
 ] 

Uma Maheswara Rao G commented on HDFS-8020:
---

I think, its better to refine the title to represent exact task what we are 
planning to cover in this JIRA. It will avoid confusions again..

> Erasure Coding: restore BlockGroup and schema info from stripping coding 
> command
> 
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> As a task of HDFS-7348, to process *stripping* coding commands from NameNode 
> or other scheduler services/tools, we need to first be able to restore 
> BlockGroup and schema information in DataNode, which will be used to 
> construct and perform coding work using {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8020) Erasure Coding: restore BlockGroup and schema info from stripping coding command

2015-12-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059337#comment-15059337
 ] 

Kai Zheng commented on HDFS-8020:
-

Yes this issue needs to be updated since we have much of its work already 
covered by *ErasureCodingWorker* as Zhe mentioned. Let me refine it. Thanks Uma 
for the suggestion!

> Erasure Coding: restore BlockGroup and schema info from stripping coding 
> command
> 
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> As a task of HDFS-7348, to process *stripping* coding commands from NameNode 
> or other scheduler services/tools, we need to first be able to restore 
> BlockGroup and schema information in DataNode, which will be used to 
> construct and perform coding work using {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8020) Erasure Coding: constructing and mapping BlockGroup info to utilize and call ErasureCoder API

2015-12-15 Thread Kai Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8020:

Summary: Erasure Coding: constructing and mapping BlockGroup info to 
utilize and call ErasureCoder API  (was: Erasure Coding: restore BlockGroup and 
schema info from stripping coding command)

> Erasure Coding: constructing and mapping BlockGroup info to utilize and call 
> ErasureCoder API
> -
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> As a task of HDFS-7348, to process *stripping* coding commands from NameNode 
> or other scheduler services/tools, we need to first be able to restore 
> BlockGroup and schema information in DataNode, which will be used to 
> construct and perform coding work using {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8020) Erasure Coding: constructing and mapping BlockGroup info to utilize and call ErasureCoder API

2015-12-15 Thread Kai Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8020:

Description: Currently *ErasureCodingWorker* gets recovery related info and 
performs the erasure decoding task using the raw erasure coder API. In this 
follow-on issue, the recovery relate info can be further mapped to construct 
BlockGroup object to utilize and call the higher level {{ErasureCoder}} API.  
(was: As a task of HDFS-7348, to process *stripping* coding commands from 
NameNode or other scheduler services/tools, we need to first be able to restore 
BlockGroup and schema information in DataNode, which will be used to construct 
and perform coding work using {{ErasureCoder}} API.)

> Erasure Coding: constructing and mapping BlockGroup info to utilize and call 
> ErasureCoder API
> -
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> Currently *ErasureCodingWorker* gets recovery related info and performs the 
> erasure decoding task using the raw erasure coder API. In this follow-on 
> issue, the recovery relate info can be further mapped to construct BlockGroup 
> object to utilize and call the higher level {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8020) Erasure Coding: constructing and mapping BlockGroup info to utilize and call ErasureCoder API

2015-12-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059352#comment-15059352
 ] 

Kai Zheng commented on HDFS-8020:
-

I just made the refinement. Please let me know if it's still not clear yet. 
Thanks.

> Erasure Coding: constructing and mapping BlockGroup info to utilize and call 
> ErasureCoder API
> -
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> Currently *ErasureCodingWorker* gets recovery related info and performs the 
> erasure decoding task using the raw erasure coder API. In this follow-on 
> issue, the recovery relate info can be further mapped to construct BlockGroup 
> object to utilize and call the higher level {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9011) Support splitting BlockReport of a storage into multiple RPC

2015-12-15 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059359#comment-15059359
 ] 

Ajith S commented on HDFS-9011:
---

would cause this https://issues.apache.org/jira/browse/HDFS-8610

> Support splitting BlockReport of a storage into multiple RPC
> 
>
> Key: HDFS-9011
> URL: https://issues.apache.org/jira/browse/HDFS-9011
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-9011.000.patch, HDFS-9011.001.patch, 
> HDFS-9011.002.patch
>
>
> Currently if a DataNode has too many blocks (more than 1m by default), it 
> sends multiple RPC to the NameNode for the block report, each RPC contains 
> report for a single storage. However, in practice we've seen sometimes even a 
> single storage can contains large amount of blocks and the report even 
> exceeds the max RPC data length. It may be helpful to support sending 
> multiple RPC for the block report of a storage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 122 matches

Mail list logo