[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698093#comment-14698093
 ] 

Vinayakumar B commented on HDFS-7285:
-

Thanks for starting jenkins job. 
I prefer to wait for sometime for others also to take a look at the rebase. 
Then we can decide about moving this as HDFS-7285 Branch.

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: Consolidated-20150707.patch, 
> Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, 
> ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFS-7285-merge-consolidated-01.patch, 
> HDFS-7285-merge-consolidated-trunk-01.patch, 
> HDFS-7285-merge-consolidated.trunk.03.patch, 
> HDFS-7285-merge-consolidated.trunk.04.patch, 
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
> HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
> HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8883) NameNode Metrics : Add FSNameSystem lock Queue Length

2015-08-14 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698089#comment-14698089
 ] 

Xiaoyu Yao commented on HDFS-8883:
--

Thanks [~anu] for working on this. 
Patch LGTM. +1.

> NameNode Metrics : Add FSNameSystem lock Queue Length
> -
>
> Key: HDFS-8883
> URL: https://issues.apache.org/jira/browse/HDFS-8883
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-8883.001.patch
>
>
> FSNameSystemLock can have contention when NameNode is under load. This patch 
> adds  LockQueueLength -- the number of threads waiting on FSNameSystemLock -- 
> as a metric in NameNode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698064#comment-14698064
 ] 

Sean Busbey commented on HDFS-8344:
---

the issue where the other reports have been showing up is HDFS-8406. I believe 
in several cases we're doing burn in tests against cluster deployments, so if 
the config isn't something we'd have people run with we ought not do it.

IIRC, the lease failures were over pretty extended periods of time. > 15 
minutes for the cases where it caused my HBase failures.

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698054#comment-14698054
 ] 

Ravi Prakash commented on HDFS-8344:


Thanks for the report Sean! The recovery time can be modified for tests if you 
set the timeout super low. e.g. in the unit test 
{{testLeaseRecoveryWithMissingBlocks}} I use these 2 lines
{noformat}
cluster.setLeasePeriod(LEASE_PERIOD, LEASE_PERIOD);
BlockInfoUnderConstruction.setRecoveryTimeout(1);
{noformat}
 
Haohui! By recovery, I meant recovery of a replica (not lease recovery). Please 
let me know if you think this sequence cannot happen:
1. Client writes data
2. Client dies
3. Datanodes A, B and C (on which data) was written die
4. Lease recovery tries to recover data from Datanodes A and B but fail 
(because datanodes are dead)
5. Cluster is taken down
6. Datanodes A or B are resurrected
7. The cluster is brought back up after {{RECOVERY_TIMEOUT}}
8. {{FSNamesystem.getShouldForciblyCompleteMissingUCBlock}} returns true 
because RECOVERY_TIMEOUT has expired
9. Block is forcibly marked complete and the file is labeled as having missing 
blocks

IF Datanode A or B were back, the data would be recovered. The only difference 
would be that the lease would have been forcefully recovered. 

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698040#comment-14698040
 ] 

Hadoop QA commented on HDFS-8344:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750625/HDFS-8344.09.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / dc7a061 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12003/console |


This message was automatically generated.

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698031#comment-14698031
 ] 

Hadoop QA commented on HDFS-8801:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 19s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 23s | The applied patch generated  
21 new checkstyle issues (total was 1114, now 1100). |
| {color:green}+1{color} | whitespace |   0m 12s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 52s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 52s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 180m  9s | Tests failed in hadoop-hdfs. |
| | | 225m 48s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.viewfs.TestViewFileSystemHdfs |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750602/HDFS-8801.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2bc0a4f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12001/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12001/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12001/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12001/console |


This message was automatically generated.

> Convert BlockInfoUnderConstruction as a feature
> ---
>
> Key: HDFS-8801
> URL: https://issues.apache.org/jira/browse/HDFS-8801
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Zhe Zhang
>Assignee: Jing Zhao
> Attachments: HDFS-8801.000.patch
>
>
> Per discussion under HDFS-8499, with the erasure coding feature, there will 
> be 4 types of {{BlockInfo}} forming a multi-inheritance: 
> {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
> {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
> was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
> aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698009#comment-14698009
 ] 

Walter Su commented on HDFS-8220:
-

bq. Should we always do endBlock() and close when setting a streamer as failed?
No. We set a streamer failed to prevent OutputStream write data to it. I think 
the failed flag should unset when move on to next block group. So no close() 
should called.

The patch doesn't handle the situation when first LocatedStripedBlock satisfied 
GroupSize, but the second doesn't. It's possible when some DNs become busy.
Since this jira already closed, and also it's more complicated, we can fix it 
later. Thanks [~rakeshr] for fixing the first step.

 [~szetszwo], would you mind solve the conflicts in HDFS-8838?

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-7285
>
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697981#comment-14697981
 ] 

Haohui Mai commented on HDFS-8344:
--

bq. Even if its simpler, there's a chance that recovery is never attempted, and 
that is not acceptable IMHO.

Can you explain how the NN never try to recover the lease? All leases are 
periodically checked in {{LeaseManager#checkLease()}}, where the recovery 
happens.

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697977#comment-14697977
 ] 

Sean Busbey commented on HDFS-8344:
---

This has been a recurring problem for both HBase and Accumulo in test rigs. I 
don't think we care if the value is configurable so long as it is guaranteed to 
terminate and does so in a reasonably short (single-digit-seconds) period of 
time since it is in our recovery paths.

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697956#comment-14697956
 ] 

Zhe Zhang commented on HDFS-7285:
-

Many thanks to Vinay for the great effort! I just cherry-picked the 2 new 
commits (HDFS-8854 and HDFS-8220) to the branch. Also created a Jenkins [job | 
https://builds.apache.org/job/Hadoop-HDFS-7285-nightly/].

I'll also compare {{HDFS-7285-REBASE}} with the consolidated patch. After that 
and verifying Jenkins results, I'll push it as {{HDFS-7285}} so we can better 
proceed with pending subtasks. I'll also move the current {{HDFS-7285}} branch 
as a backup, in case we want to reconcile differences in individual commits.

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: Consolidated-20150707.patch, 
> Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, 
> ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFS-7285-merge-consolidated-01.patch, 
> HDFS-7285-merge-consolidated-trunk-01.patch, 
> HDFS-7285-merge-consolidated.trunk.03.patch, 
> HDFS-7285-merge-consolidated.trunk.04.patch, 
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
> HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
> HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8344:
---
Attachment: HDFS-8344.09.patch

Here's a patch with timeout and number of retries

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697943#comment-14697943
 ] 

Walter Su commented on HDFS-8838:
-

I saw HDFS-8220 just get committed, would you mind rebase this to solve 
conflicts?

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-8838-HDFS-7285-000.patch, 
> HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
> h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
> h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697917#comment-14697917
 ] 

Hadoop QA commented on HDFS-8220:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 11s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 27s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 196m 49s | Tests failed in hadoop-hdfs. |
| | | 239m  2s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestWriteStripedFileWithFailure |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750580/HDFS-8220-HDFS-7285-merge-10.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1d37a88 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12000/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12000/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12000/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12000/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12000/console |


This message was automatically generated.

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-7285
>
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataS

[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697904#comment-14697904
 ] 

Zhe Zhang commented on HDFS-8801:
-

Thanks for initiating the work Jing! The overall structure in the patch looks 
good to me.

Should we take the chance to change {{replicas}} from a List to an array? This 
can offset some of the memory overhead from the feature pointer, and also help 
us reconcile trunk with the striped UC code later.

> Convert BlockInfoUnderConstruction as a feature
> ---
>
> Key: HDFS-8801
> URL: https://issues.apache.org/jira/browse/HDFS-8801
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Zhe Zhang
>Assignee: Jing Zhao
> Attachments: HDFS-8801.000.patch
>
>
> Per discussion under HDFS-8499, with the erasure coding feature, there will 
> be 4 types of {{BlockInfo}} forming a multi-inheritance: 
> {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
> {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
> was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
> aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697873#comment-14697873
 ] 

Zhe Zhang commented on HDFS-8853:
-

Thanks [~andreina] for the patch. Do you mind rebasing it?

I was also thinking about the issue when creating HDFS-8833 patch. In the long 
term, it might be better for the client to pass a {{String}} to NN instead of 
the actual policy/schema.

> Erasure Coding: Provide ECSchema validation when creating ECZone
> 
>
> Key: HDFS-8853
> URL: https://issues.apache.org/jira/browse/HDFS-8853
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: J.Andreina
> Attachments: HDFS-8853-HDFS-7285-01.patch
>
>
> Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} 
> doesn't have any validation that the given {{ecSchema}} is available in 
> {{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists 
> then will create the ECZone with {{null}} schema. IMHO we could improve this 
> by doing necessary basic sanity checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8220:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7285
   Status: Resolved  (was: Patch Available)

+1 on the latest patch; I just committed to {{HDFS-7285}} and 
{{HDFS-7285-merge}}. Thanks Rakesh for the contribution, and reviews from 
Walter and Nicholas.

I see the below follow-ons:
# Should we always do {{endBlock()}} and {{close}} when setting a streamer as 
failed? If so should we change {{DFSStripedOutputStream#handleStreamerFailure}}?
# We should add another test to emulate null locations. Right now we are only 
testing short list of locations.

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-7285
>
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697842#comment-14697842
 ] 

Hudson commented on HDFS-8891:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8309 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8309/])
HDFS-8891. HDFS concat should keep srcs order. Contributed by Yong Zhang. 
(jing9: rev dc7a061668a3f4d86fe1b07a40d46774b5386938)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHDFSConcat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> HDFS concat should keep srcs order
> --
>
> Key: HDFS-8891
> URL: https://issues.apache.org/jira/browse/HDFS-8891
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yong Zhang
>Assignee: Yong Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch
>
>
> FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
> order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697844#comment-14697844
 ] 

Hadoop QA commented on HDFS-8833:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750606/HDFS-8833-HDFS-7285-merge.00.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1d37a88 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12002/console |


This message was automatically generated.

> Erasure coding: store EC schema and cell size in INodeFile and eliminate 
> notion of EC zones
> ---
>
> Key: HDFS-8833
> URL: https://issues.apache.org/jira/browse/HDFS-8833
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-8833-HDFS-7285-merge.00.patch
>
>
> We have [discussed | 
> https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
>  storing EC schema with files instead of EC zones and recently revisited the 
> discussion under HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and 
> nested configuration. Those limitations are valid in encryption for security 
> reasons and it doesn't make sense to carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
> simplicity, we should first implement it as an xattr and consider memory 
> optimizations (such as moving it to file header) as a follow-on. We should 
> also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8824) Do not use small blocks for balancing the cluster

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8824:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks Jitendra for reviewing the patch.

I have committed this.

> Do not use small blocks for balancing the cluster
> -
>
> Key: HDFS-8824
> URL: https://issues.apache.org/jira/browse/HDFS-8824
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.8.0
>
> Attachments: h8824_20150727b.patch, h8824_20150811b.patch
>
>
> Balancer gets datanode block lists from NN and then move the blocks in order 
> to balance the cluster.  It should not use the blocks with small size since 
> moving the small blocks generates a lot of overhead and the small blocks do 
> not help balancing the cluster much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

2015-08-14 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8833:

Attachment: HDFS-8833-HDFS-7285-merge.00.patch

Attaching initial patch to demonstrate the idea. I'm working on some additional 
refactors including renaming {{INodeFile#isStriped}} to 
{{erasureCodingPolicy}}. Meanwhile, please let me know if you see any issue in 
the main logic.

[~jingzhao] This patch removes the constraint of setting policy on non-empty 
directories (it's a simple change anyway). When you have a chance could you 
take a look at my comment above? Thanks much.

I plan to file a separate JIRA for code cleanup, including:
* Consolidate different getter and setter methods in {{FSDirErasureCodingOp}}.
* Examine the difference in EC policy and storage policy logics:
** Client passes an {{ErasureCodingPolicy}} object to NN when setting policy, 
in SP it's a string
** Do we still need a separate {{FSDirErasureCodingOp}} class?
** Calling different methods when setting directory XAttrs

> Erasure coding: store EC schema and cell size in INodeFile and eliminate 
> notion of EC zones
> ---
>
> Key: HDFS-8833
> URL: https://issues.apache.org/jira/browse/HDFS-8833
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-8833-HDFS-7285-merge.00.patch
>
>
> We have [discussed | 
> https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
>  storing EC schema with files instead of EC zones and recently revisited the 
> discussion under HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and 
> nested configuration. Those limitations are valid in encryption for security 
> reasons and it doesn't make sense to carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
> simplicity, we should first implement it as an xattr and consider memory 
> optimizations (such as moving it to file header) as a follow-on. We should 
> also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697817#comment-14697817
 ] 

Ravi Prakash commented on HDFS-8344:


bq. If you take down the cluster and bring it back up. All writing pipeline 
will fail and should fail.
That is correct. This JIRA is for the case that data loss has already occurred. 
i.e. client died + the DNs to which it wrote already died. We are trying to 
recover the lease in this JIRA. My argument was that after client+DNs have 
died, if I only have a timeout, I could take down the cluster. When I bring the 
cluster back up after the timeout value, the lease will be recovered without 
trying all the DNs.
bq. This is internal implementation details and I'm very reluctant to make it 
configurable 
Perhaps I should have said "internal hard-coded" configuration? Similar to 
{{recoveryAttemptsBeforeMarkingBlockMissing}} of version 8 of the patch.

bq.  Having only one concept for detecting failures (i.e., time out) is simpler 
than two (i.e., time out and number of retries).
Even if its simpler, there's a chance that recovery is never attempted, and 
that is not acceptable IMHO.


> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8891:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks Yong for the contribution!

> HDFS concat should keep srcs order
> --
>
> Key: HDFS-8891
> URL: https://issues.apache.org/jira/browse/HDFS-8891
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yong Zhang
>Assignee: Yong Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch
>
>
> FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
> order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8891:

Issue Type: Bug  (was: Improvement)

> HDFS concat should keep srcs order
> --
>
> Key: HDFS-8891
> URL: https://issues.apache.org/jira/browse/HDFS-8891
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yong Zhang
>Assignee: Yong Zhang
> Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch
>
>
> FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
> order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697814#comment-14697814
 ] 

Jing Zhao commented on HDFS-8891:
-

+1. I will commit the patch shortly.

> HDFS concat should keep srcs order
> --
>
> Key: HDFS-8891
> URL: https://issues.apache.org/jira/browse/HDFS-8891
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yong Zhang
>Assignee: Yong Zhang
> Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch
>
>
> FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
> order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

2015-08-14 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8833:

Status: Patch Available  (was: Open)

> Erasure coding: store EC schema and cell size in INodeFile and eliminate 
> notion of EC zones
> ---
>
> Key: HDFS-8833
> URL: https://issues.apache.org/jira/browse/HDFS-8833
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> We have [discussed | 
> https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
>  storing EC schema with files instead of EC zones and recently revisited the 
> discussion under HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and 
> nested configuration. Those limitations are valid in encryption for security 
> reasons and it doesn't make sense to carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
> simplicity, we should first implement it as an xattr and consider memory 
> optimizations (such as moving it to file header) as a follow-on. We should 
> also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8801:

Attachment: HDFS-8801.000.patch

Initial patch to demo the idea.

> Convert BlockInfoUnderConstruction as a feature
> ---
>
> Key: HDFS-8801
> URL: https://issues.apache.org/jira/browse/HDFS-8801
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Zhe Zhang
> Attachments: HDFS-8801.000.patch
>
>
> Per discussion under HDFS-8499, with the erasure coding feature, there will 
> be 4 types of {{BlockInfo}} forming a multi-inheritance: 
> {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
> {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
> was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
> aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-8801:
---

Assignee: Jing Zhao

> Convert BlockInfoUnderConstruction as a feature
> ---
>
> Key: HDFS-8801
> URL: https://issues.apache.org/jira/browse/HDFS-8801
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Zhe Zhang
>Assignee: Jing Zhao
> Attachments: HDFS-8801.000.patch
>
>
> Per discussion under HDFS-8499, with the erasure coding feature, there will 
> be 4 types of {{BlockInfo}} forming a multi-inheritance: 
> {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
> {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
> was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
> aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8801:

Affects Version/s: (was: 2.7.1)
   Status: Patch Available  (was: Open)

> Convert BlockInfoUnderConstruction as a feature
> ---
>
> Key: HDFS-8801
> URL: https://issues.apache.org/jira/browse/HDFS-8801
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Zhe Zhang
>Assignee: Jing Zhao
> Attachments: HDFS-8801.000.patch
>
>
> Per discussion under HDFS-8499, with the erasure coding feature, there will 
> be 4 types of {{BlockInfo}} forming a multi-inheritance: 
> {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
> {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
> was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
> aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697787#comment-14697787
 ] 

Jing Zhao commented on HDFS-8801:
-

Actually converting BlockInfoUnderConstruction can bring us some benefits. 
Currently when processing block reports, if a finalized replica is reported, we 
may replace the corresponding UC blockInfo object with a newly created complete 
blockInfo object inside of the INodeFile. This replacement mixes the states of 
the block storage management and the NameSystem management, and forces the 
block report processing to take the Namesystem write lock.

To convert BlockInfoUC as a feature can avoid the BlockInfo object replacement. 
It helps separating the storage level and file system level, and allows us to 
do further block report processing improvement (e.g., separating the lock for 
namesystem and blockmanager).

> Convert BlockInfoUnderConstruction as a feature
> ---
>
> Key: HDFS-8801
> URL: https://issues.apache.org/jira/browse/HDFS-8801
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>
> Per discussion under HDFS-8499, with the erasure coding feature, there will 
> be 4 types of {{BlockInfo}} forming a multi-inheritance: 
> {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
> {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
> was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
> aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697728#comment-14697728
 ] 

Haohui Mai commented on HDFS-8344:
--

If you take down the cluster and bring it back up. All writing pipeline will 
fail and should fail.

bq. I can add one more configuration for the timeout (in addition to the number 
of retries)

This is the exact reason where my previous -1 comes from. This is internal 
implementation details and I'm very reluctant to make it configurable because 
(1) it's difficult to determine the right value, and (2) users can easily shoot 
their foots and cause data loss when these numbers are misconfigured.

bq.  It feels like we are over-designing now.

I disagree. Having only one concept for detecting failures (i.e., time out) is 
simpler than two (i.e., time out and number of retries).

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to

2015-08-14 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697725#comment-14697725
 ] 

Ming Ma commented on HDFS-7446:
---

For the 2.6.1 effort, the backport is straightforward. But the API has changed 
compared to 2.6.0. This incompatibility only impacts folks who have been using 
inotify functionality introduced in 2.6.0.

> HDFS inotify should have the ability to determine what txid it has read up to
> -
>
> Key: HDFS-7446
> URL: https://issues.apache.org/jira/browse/HDFS-7446
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: HDFS-7446.001.patch, HDFS-7446.002.patch, 
> HDFS-7446.003.patch
>
>
> HDFS inotify should have the ability to determine what txid it has read up 
> to.  This will allow users who want to avoid missing any events to record 
> this txid and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Status: Open  (was: Patch Available)

> Make Trash Interval configurable for each of the namespaces
> ---
>
> Key: HDFS-6244
> URL: https://issues.apache.org/jira/browse/HDFS-6244
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
> HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch
>
>
> Somehow we need to avoid the cluster filling up.
> One solution is to have a different trash policy per namespace. However, if 
> we can simply make the property configurable per namespace, then the same 
> config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Status: Patch Available  (was: Open)

> Make Trash Interval configurable for each of the namespaces
> ---
>
> Key: HDFS-6244
> URL: https://issues.apache.org/jira/browse/HDFS-6244
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
> HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch
>
>
> Somehow we need to avoid the cluster filling up.
> One solution is to have a different trash policy per namespace. However, if 
> we can simply make the property configurable per namespace, then the same 
> config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697693#comment-14697693
 ] 

Arpit Agarwal commented on HDFS-7649:
-

Thanks for catching and taking care of this Nicholas.

> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8824) Do not use small blocks for balancing the cluster

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697679#comment-14697679
 ] 

Hudson commented on HDFS-8824:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8308 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8308/])
HDFS-8824. Do not use small blocks for balancing the cluster. (szetszwo: rev 
2bc0a4f299fbd8035e29f62ce9cd22e209a62805)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


> Do not use small blocks for balancing the cluster
> -
>
> Key: HDFS-8824
> URL: https://issues.apache.org/jira/browse/HDFS-8824
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8824_20150727b.patch, h8824_20150811b.patch
>
>
> Balancer gets datanode block lists from NN and then move the blocks in order 
> to balance the cluster.  It should not use the blocks with small size since 
> moving the small blocks generates a lot of overhead and the small blocks do 
> not help balancing the cluster much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697681#comment-14697681
 ] 

Ravi Prakash commented on HDFS-8344:


Hi Haohui! There are arguments on both sides (time based vs count based). e.g. 
I may take down the cluster and bring it back up after enough time to expire 
the timeout in which case we wouldn't have retried enough times. 
Please let me know if you feel strongly though, and I can add one more 
configuration for the timeout (in addition to the number of retries). It feels 
like we are over-designing now. This is a rare enough event (client dies, and 
before the lease expiration so do the nodes it wrote to).

> NameNode doesn't recover lease for files with missing blocks
> 
>
> Key: HDFS-8344
> URL: https://issues.apache.org/jira/browse/HDFS-8344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.8.0
>
> Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
> HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
> HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697668#comment-14697668
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7649:
---

It seems that this was only committed to trunk but not yet merged to branch-2.

> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697669#comment-14697669
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7649:
---

Merged this to branch-2.

> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8565) Typo in dfshealth.html - "Decomissioning"

2015-08-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8565:
-
Fix Version/s: 2.8.0

> Typo in dfshealth.html - "Decomissioning"
> -
>
> Key: HDFS-8565
> URL: https://issues.apache.org/jira/browse/HDFS-8565
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: HDFS-8565.patch
>
>
> Decomissioning
> change to 
> Decommissioning
> in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8565) Typo in dfshealth.html - "Decomissioning"

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697646#comment-14697646
 ] 

Hudson commented on HDFS-8565:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8307 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8307/])
HDFS-8565. Typo in dfshealth.html - Decomissioning. (nijel via xyao) (xyao: rev 
1569228ec9090823186f062257fdf1beb5ee1781)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html


> Typo in dfshealth.html - "Decomissioning"
> -
>
> Key: HDFS-8565
> URL: https://issues.apache.org/jira/browse/HDFS-8565
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: HDFS-8565.patch
>
>
> Decomissioning
> change to 
> Decommissioning
> in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8565) Typo in dfshealth.html - "Decomissioning"

2015-08-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8565:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks [~nijel] for the contribution. The patch has been committed to trunk and 
branch-2.

> Typo in dfshealth.html - "Decomissioning"
> -
>
> Key: HDFS-8565
> URL: https://issues.apache.org/jira/browse/HDFS-8565
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: HDFS-8565.patch
>
>
> Decomissioning
> change to 
> Decommissioning
> in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8565) Typo in dfshealth.html - "Decomissioning"

2015-08-14 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697592#comment-14697592
 ] 

Xiaoyu Yao commented on HDFS-8565:
--

+1. Patch LGTM. I will commit it shortly.

> Typo in dfshealth.html - "Decomissioning"
> -
>
> Key: HDFS-8565
> URL: https://issues.apache.org/jira/browse/HDFS-8565
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: HDFS-8565.patch
>
>
> Decomissioning
> change to 
> Decommissioning
> in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697564#comment-14697564
 ] 

Hadoop QA commented on HDFS-6244:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 24s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | javac |   1m 37s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750579/HDFS-6244.v5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11999/console |


This message was automatically generated.

> Make Trash Interval configurable for each of the namespaces
> ---
>
> Key: HDFS-6244
> URL: https://issues.apache.org/jira/browse/HDFS-6244
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
> HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch
>
>
> Somehow we need to avoid the cluster filling up.
> One solution is to have a different trash policy per namespace. However, if 
> we can simply make the property configurable per namespace, then the same 
> config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697551#comment-14697551
 ] 

Rakesh R commented on HDFS-8220:


I've rebased previous patch on {{HDFS-7285-merge}} branch and attached the same 
here.

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts

2015-08-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697547#comment-14697547
 ] 

Jason Lowe commented on HDFS-8898:
--

This would solve a significant annoyance with computing quotas on a shared 
tree.  However I think it has security implications.  If one can get the quota 
totals for the entire tree then they can calculate what must be used by the 
parts they cannot access via quota_usage - usage_visible.  If what is being 
stored in the restricted area is sensitive (e.g.: records related to 
financials) then knowing how many files or the size of the restricted data 
could leak sensitive information.

> Create API and command-line argument to get quota without need to get file 
> and directory counts
> ---
>
> Key: HDFS-8898
> URL: https://issues.apache.org/jira/browse/HDFS-8898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: Joep Rottinghuis
>
> On large directory structures it takes significant time to iterate through 
> the file and directory counts recursively to get a complete ContentSummary.
> When you want to just check for the quota on a higher level directory it 
> would be good to have an option to skip the file and directory counts.
> Moreover, currently one can only check the quota if you have access to all 
> the directories underneath. For example, if I have a large home directory 
> under /user/joep and I host some files for another user in a sub-directory, 
> the moment they create an unreadable sub-directory under my home I can no 
> longer check what my quota is. Understood that I cannot check the current 
> file counts unless I can iterate through all the usage, but for 
> administrative purposes it is nice to be able to get the current quota 
> setting on a directory without the need to iterate through and run into 
> permission issues on sub-directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8220:
---
Attachment: HDFS-8220-HDFS-7285-merge-10.patch

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, 
> HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Attachment: HDFS-6244.v5.patch

> Make Trash Interval configurable for each of the namespaces
> ---
>
> Key: HDFS-6244
> URL: https://issues.apache.org/jira/browse/HDFS-6244
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
> HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch
>
>
> Somehow we need to avoid the cluster filling up.
> One solution is to have a different trash policy per namespace. However, if 
> we can simply make the property configurable per namespace, then the same 
> config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Attachment: (was: HDFS-6244.v5.patch)

> Make Trash Interval configurable for each of the namespaces
> ---
>
> Key: HDFS-6244
> URL: https://issues.apache.org/jira/browse/HDFS-6244
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
> HDFS-6244.v3.patch, HDFS-6244.v4.patch
>
>
> Somehow we need to avoid the cluster filling up.
> One solution is to have a different trash policy per namespace. However, if 
> we can simply make the property configurable per namespace, then the same 
> config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697529#comment-14697529
 ] 

Hadoop QA commented on HDFS-6244:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  1s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750575/HDFS-6244.v5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11998/console |


This message was automatically generated.

> Make Trash Interval configurable for each of the namespaces
> ---
>
> Key: HDFS-6244
> URL: https://issues.apache.org/jira/browse/HDFS-6244
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
> HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch
>
>
> Somehow we need to avoid the cluster filling up.
> One solution is to have a different trash policy per namespace. However, if 
> we can simply make the property configurable per namespace, then the same 
> config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Attachment: HDFS-6244.v5.patch

> Make Trash Interval configurable for each of the namespaces
> ---
>
> Key: HDFS-6244
> URL: https://issues.apache.org/jira/browse/HDFS-6244
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
> HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch
>
>
> Somehow we need to avoid the cluster filling up.
> One solution is to have a different trash policy per namespace. However, if 
> we can simply make the property configurable per namespace, then the same 
> config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8899) Erasure Coding: use threadpool for EC recovery tasks

2015-08-14 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8899:
--

 Summary: Erasure Coding: use threadpool for EC recovery tasks
 Key: HDFS-8899
 URL: https://issues.apache.org/jira/browse/HDFS-8899
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea is to use threadpool for processing erasure coding recovery tasks at 
the datanode.

{code}
new Daemon(new ReconstructAndTransferBlock(recoveryInfo)).start();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697397#comment-14697397
 ] 

Hadoop QA commented on HDFS-6955:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 22s | The applied patch generated  3 
new checkstyle issues (total was 154, now 155). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 55s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 146m  6s | Tests failed in hadoop-hdfs. |
| | | 191m 58s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.blockmanagement.TestNodeCount |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureOfSharedDir |
|   | org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750505/HDFS-6955-01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/console |


This message was automatically generated.

> DN should reserve disk space for a full block when creating tmp files
> -
>
> Key: HDFS-6955
> URL: https://issues.apache.org/jira/browse/HDFS-6955
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Arpit Agarwal
>Assignee: kanaka kumar avvaru
> Attachments: HDFS-6955-01.patch
>
>
> HDFS-6898 is introducing disk space reservation for RBW files to avoid 
> running out of disk space midway through block creation.
> This Jira is to introduce similar reservation for tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8093) BP does not exist or is not under Constructionnull

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697395#comment-14697395
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8093:
---

The file /system/balancer.id seems to be deleted.  Could you grep 
/system/balancer.id from the NN log?

Also, are there other log messages between 2015-08-14 00:30:03,843 and 
2015-08-14 00:30:04,000?

> BP does not exist or is not under Constructionnull
> --
>
> Key: HDFS-8093
> URL: https://issues.apache.org/jira/browse/HDFS-8093
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
> Environment: Centos 6.5
>Reporter: LINTE
>
> HDFS balancer run during several hours blancing blocs beetween datanode, it 
> ended by failing with the following error.
> getStoredBlock function return a null BlockInfo.
> java.io.IOException: Bad response ERROR for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 from 
> datanode 192.168.0.18:1004
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897)
> 15/04/08 05:52:51 WARN hdfs.DFSClient: Error Recovery for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 in pipeline 
> 192.168.0.63:1004, 192.168.0.1:1004, 192.168.0.18:1004: bad datanode 
> 192.168.0.18:1004
> 15/04/08 05:52:51 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
> exist or is not under Constructionnull
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> at org.apache.hadoop.ipc.Client.call(Client.java:1468)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy12.updateBlockForPipeline(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1266)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
> 15/04/08 05:52:51 ERROR hdfs.DFSClient: Failed to close inode 19801755
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
> exist or is not under Constructionnull
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSName

[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697377#comment-14697377
 ] 

Zhe Zhang commented on HDFS-8220:
-

[~rakeshr] Yes it'd be great if you can create a patch for {{HDFS-7285-merge}}. 
I don't think there will be much conflict since this change is on client.

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, 
> HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697330#comment-14697330
 ] 

Walter Su commented on HDFS-8838:
-

failed tests not related.
+1 for the last patch. (20150809.patch)

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-8838-HDFS-7285-000.patch, 
> HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
> h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
> h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697324#comment-14697324
 ] 

Rakesh R commented on HDFS-8220:


Any more comments on the attached patch. Hi [~zhz], I hope {{HDFS-7285-merge}} 
is the active branch, should I create another patch now?

> Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
> doesn't satisfy BlockGroupSize
> ---
>
> Key: HDFS-8220
> URL: https://issues.apache.org/jira/browse/HDFS-8220
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
> HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
> HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, 
> HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
> HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch
>
>
> During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
> validate the available datanodes against the {{BlockGroupSize}}. Please see 
> the exception to understand more:
> {code}
> 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
> DataStreamer Exception
> java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
> 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
> (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
> java.io.IOException: DataStreamer Exception: 
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager

2015-08-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697316#comment-14697316
 ] 

Walter Su commented on HDFS-8896:
-

The failed tests failed before the 
patch([link|https://builds.apache.org/job/PreCommit-HDFS-Build/11989/testReport/]).
 So it's not related.

> DataNode object isn't GCed when shutdown, because it has GC root in 
> ShutdownHookManager
> ---
>
> Key: HDFS-8896
> URL: https://issues.apache.org/jira/browse/HDFS-8896
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG
>
>
> The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC 
> root.
> screenshot_1 shows how DN object be traced to the GC root.
> It's not a problem in production.
> It's a problem in test, especially when MiniDFSCluster starts/shutdowns many 
> DNs, which could cause {{OutOfMemoryError}}.
> screenshot_2 shows many DN objects are not GCed when run the test of 
> HDFS-8838.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts

2015-08-14 Thread Joep Rottinghuis (JIRA)
Joep Rottinghuis created HDFS-8898:
--

 Summary: Create API and command-line argument to get quota without 
need to get file and directory counts
 Key: HDFS-8898
 URL: https://issues.apache.org/jira/browse/HDFS-8898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Reporter: Joep Rottinghuis


On large directory structures it takes significant time to iterate through the 
file and directory counts recursively to get a complete ContentSummary.
When you want to just check for the quota on a higher level directory it would 
be good to have an option to skip the file and directory counts.

Moreover, currently one can only check the quota if you have access to all the 
directories underneath. For example, if I have a large home directory under 
/user/joep and I host some files for another user in a sub-directory, the 
moment they create an unreadable sub-directory under my home I can no longer 
check what my quota is. Understood that I cannot check the current file counts 
unless I can iterate through all the usage, but for administrative purposes it 
is nice to be able to get the current quota setting on a directory without the 
need to iterate through and run into permission issues on sub-directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697240#comment-14697240
 ] 

Hadoop QA commented on HDFS-8838:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 56s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 20s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 20s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  21m 55s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 204m 28s | Tests failed in hadoop-hdfs. |
| | | 268m 52s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.net.TestNetUtils |
|   | hadoop.ha.TestZKFailoverController |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.TestWriteStripedFileWithFailure |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750492/HDFS-8838-HDFS-7285-20150809-test.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1d37a88 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/console |


This message was automatically generated.

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-8838-HDFS-7285-000.patch, 
> HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
> h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
> h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697154#comment-14697154
 ] 

Hadoop QA commented on HDFS-8896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 12s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 25s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 174m 49s | Tests failed in hadoop-hdfs. |
| | | 242m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750491/HDFS-8896.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/console |


This message was automatically generated.

> DataNode object isn't GCed when shutdown, because it has GC root in 
> ShutdownHookManager
> ---
>
> Key: HDFS-8896
> URL: https://issues.apache.org/jira/browse/HDFS-8896
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG
>
>
> The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC 
> root.
> screenshot_1 shows how DN object be traced to the GC root.
> It's not a problem in production.
> It's a problem in test, especially when MiniDFSCluster starts/shutdowns many 
> DNs, which could cause {{OutOfMemoryError}}.
> screenshot_2 shows many DN objects are not GCed when run the test of 
> HDFS-8838.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...

2015-08-14 Thread LINTE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LINTE updated HDFS-8897:

Summary: Loadbalancer always exits with : java.io.IOException: Another 
Balancer is running..  Exiting ...  (was: Loadbalancer )

> Loadbalancer always exits with : java.io.IOException: Another Balancer is 
> running..  Exiting ...
> 
>
> Key: HDFS-8897
> URL: https://issues.apache.org/jira/browse/HDFS-8897
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.1
> Environment: Centos 6.6
>Reporter: LINTE
>
> When balancer is launched, it should test if there is already a 
> /system/balancer.id file in HDFS.
> When the file doesn't exist, the balancer don't want to run : 
> 15/08/14 16:35:12 INFO balancer.Balancer: namenodes  = [hdfs://sandbox/, 
> hdfs://sandbox]
> 15/08/14 16:35:12 INFO balancer.Balancer: parameters = 
> Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration 
> = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
> Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> java.io.IOException: Another Balancer is running..  Exiting ...
> Aug 14, 2015 4:35:14 PM  Balancing took 2.408 seconds
> Looking at the audit log file when trying to run the balancer, the balancer 
> create the /system/balancer.id and then delete it on exiting ... 
> 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
> src=/system/balancer.id dst=nullperm=null   proto=rpc
> 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=create  
> src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r-  
> proto=rpc
> 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
> src=/system/balancer.id dst=nullperm=null   proto=rpc
> 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
> src=/system/balancer.id dst=nullperm=null   proto=rpc
> 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
> src=/system/balancer.id dst=nullperm=null   proto=rpc
> 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=delete  
> src=/system/balancer.id dst=nullperm=null   proto=rpc
> The error seems to be located in 
> org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java 
> The function checkAndMarkRunning return null even if the /system/balancer.id 
> doesn't exist before entering this function; if it exists, then it is deleted 
> and the balancer exit with the same error.
> 
>   private OutputStream checkAndMarkRunning() throws IOException {
> try {
>   if (fs.exists(idPath)) {
> // try appending to it so that it will fail fast if another balancer 
> is
> // running.
> IOUtils.closeStream(fs.append(idPath));
> fs.delete(idPath, true);
>   }
>   final FSDataOutputStream fsout = fs.create(idPath, false);
>   // mark balancer idPath to be deleted during filesystem closure
>   fs.deleteOnExit(idPath);
>   if (write2IdFile) {
> fsout.writeBytes(InetAddress.getLocalHost().getHostName());
> fsout.hflush();
>   }
>   return fsout;
> } catch(RemoteException e) {
>   
> if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){
> return null;
>   } else {
> throw e;
>   }
> }
>   }
> 
> Regards



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8897) Loadbalancer

2015-08-14 Thread LINTE (JIRA)
LINTE created HDFS-8897:
---

 Summary: Loadbalancer 
 Key: HDFS-8897
 URL: https://issues.apache.org/jira/browse/HDFS-8897
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Affects Versions: 2.7.1
 Environment: Centos 6.6
Reporter: LINTE


When balancer is launched, it should test if there is already a 
/system/balancer.id file in HDFS.

When the file doesn't exist, the balancer don't want to run : 

15/08/14 16:35:12 INFO balancer.Balancer: namenodes  = [hdfs://sandbox/, 
hdfs://sandbox]
15/08/14 16:35:12 INFO balancer.Balancer: parameters = 
Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 
5, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...
Aug 14, 2015 4:35:14 PM  Balancing took 2.408 seconds


Looking at the audit log file when trying to run the balancer, the balancer 
create the /system/balancer.id and then delete it on exiting ... 

2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=create  
src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r-  
proto=rpc
2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=delete  
src=/system/balancer.id dst=nullperm=null   proto=rpc

The error seems to be located in 
org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java 

The function checkAndMarkRunning return null even if the /system/balancer.id 
doesn't exist before entering this function; if it exists, then it is deleted 
and the balancer exit with the same error.




  private OutputStream checkAndMarkRunning() throws IOException {
try {
  if (fs.exists(idPath)) {
// try appending to it so that it will fail fast if another balancer is
// running.
IOUtils.closeStream(fs.append(idPath));
fs.delete(idPath, true);
  }
  final FSDataOutputStream fsout = fs.create(idPath, false);
  // mark balancer idPath to be deleted during filesystem closure
  fs.deleteOnExit(idPath);
  if (write2IdFile) {
fsout.writeBytes(InetAddress.getLocalHost().getHostName());
fsout.hflush();
  }
  return fsout;
} catch(RemoteException e) {
  if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){
return null;
  } else {
throw e;
  }
}
  }



Regards




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697144#comment-14697144
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md


> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697139#comment-14697139
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697135#comment-14697135
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> DataNode#transferBlock should report blocks that don't exist using 
> reportBadBlock
> -
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
> HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697136#comment-14697136
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> processIncrementalBlockReport performance degradation
> -
>
> Key: HDFS-7213
> URL: https://issues.apache.org/jira/browse/HDFS-7213
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
>Priority: Critical
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt
>
>
> {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
> missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
> with the increase in incremental block reports from receiving blocks, under 
> heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697138#comment-14697138
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Remove stale block invalidation work when DN re-registers with different UUID
> -
>
> Key: HDFS-7225
> URL: https://issues.apache.org/jira/browse/HDFS-7225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
> HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch
>
>
> {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
> {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
> {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
> {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
> which will use it to lookup in a {{TreeMap}}. Since the key type is 
> {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
> will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697133#comment-14697133
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697113#comment-14697113
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> DataNode#transferBlock should report blocks that don't exist using 
> reportBadBlock
> -
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
> HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697116#comment-14697116
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Remove stale block invalidation work when DN re-registers with different UUID
> -
>
> Key: HDFS-7225
> URL: https://issues.apache.org/jira/browse/HDFS-7225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
> HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch
>
>
> {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
> {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
> {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
> {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
> which will use it to lookup in a {{TreeMap}}. Since the key type is 
> {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
> will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697122#comment-14697122
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697111#comment-14697111
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697114#comment-14697114
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> processIncrementalBlockReport performance degradation
> -
>
> Key: HDFS-7213
> URL: https://issues.apache.org/jira/browse/HDFS-7213
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
>Priority: Critical
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt
>
>
> {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
> missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
> with the increase in incremental block reports from receiving blocks, under 
> heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697117#comment-14697117
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697059#comment-14697059
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Remove stale block invalidation work when DN re-registers with different UUID
> -
>
> Key: HDFS-7225
> URL: https://issues.apache.org/jira/browse/HDFS-7225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
> HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch
>
>
> {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
> {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
> {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
> {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
> which will use it to lookup in a {{TreeMap}}. Since the key type is 
> {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
> will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697065#comment-14697065
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md


> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697057#comment-14697057
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> processIncrementalBlockReport performance degradation
> -
>
> Key: HDFS-7213
> URL: https://issues.apache.org/jira/browse/HDFS-7213
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
>Priority: Critical
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt
>
>
> {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
> missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
> with the increase in incremental block reports from receiving blocks, under 
> heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697060#comment-14697060
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697056#comment-14697056
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> DataNode#transferBlock should report blocks that don't exist using 
> reportBadBlock
> -
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
> HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697054#comment-14697054
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697042#comment-14697042
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697034#comment-14697034
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> processIncrementalBlockReport performance degradation
> -
>
> Key: HDFS-7213
> URL: https://issues.apache.org/jira/browse/HDFS-7213
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
>Priority: Critical
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt
>
>
> {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
> missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
> with the increase in incremental block reports from receiving blocks, under 
> heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697036#comment-14697036
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Remove stale block invalidation work when DN re-registers with different UUID
> -
>
> Key: HDFS-7225
> URL: https://issues.apache.org/jira/browse/HDFS-7225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
> HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch
>
>
> {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
> {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
> {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
> {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
> which will use it to lookup in a {{TreeMap}}. Since the key type is 
> {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
> will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697033#comment-14697033
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> DataNode#transferBlock should report blocks that don't exist using 
> reportBadBlock
> -
>
> Key: HDFS-7235
> URL: https://issues.apache.org/jira/browse/HDFS-7235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
> HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
> final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
> b.getLocalBlock());
> return replicaInfo != null
> && replicaInfo.getState() == state
> && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697037#comment-14697037
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697031#comment-14697031
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files

2015-08-14 Thread kanaka kumar avvaru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanaka kumar avvaru updated HDFS-6955:
--
Status: Patch Available  (was: Open)

Attached the initial patch for reserving space for the block being 
re-replicated. [~arpitagarwal], please review.

> DN should reserve disk space for a full block when creating tmp files
> -
>
> Key: HDFS-6955
> URL: https://issues.apache.org/jira/browse/HDFS-6955
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Arpit Agarwal
>Assignee: kanaka kumar avvaru
> Attachments: HDFS-6955-01.patch
>
>
> HDFS-6898 is introducing disk space reservation for RBW files to avoid 
> running out of disk space midway through block creation.
> This Jira is to introduce similar reservation for tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8892) ShortCircuitCache.CacheCleaner can add Slot.isInvalid() check too

2015-08-14 Thread Ravikumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696971#comment-14696971
 ] 

Ravikumar commented on HDFS-8892:
-

A small patch we tried. Works as expected during block-deletes
…..
if (LOG.isDebugEnabled()) {
  LOG.debug(this + ": cache cleaner running at " + curMs);
}
//Copy the evictable, as purge removes elements from map
Map copy = new TreeMap(evictable);
for(ShortCircuitReplica replica:copy.values()) {
Slot currSlot = replica.getSlot();
if(currSlot != null && !currSlot.isValid()) {
if (LOG.isTraceEnabled()) {
LOG.trace("CacheCleaner: purging because slot is invalid" + 
replica + ": " + StringUtils.getStackTrace(Thread.currentThread()));
}
purge(replica);
numPurged++;
}
}
//Existing Code
int numDemoted = demoteOldEvictableMmaped(curMs);
int numPurged = 0;
….

> ShortCircuitCache.CacheCleaner can add Slot.isInvalid() check too
> -
>
> Key: HDFS-8892
> URL: https://issues.apache.org/jira/browse/HDFS-8892
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Ravikumar
>Assignee: kanaka kumar avvaru
>Priority: Minor
>
> Currently CacheCleaner thread checks only for cache-expiry times. It would be 
> nice if it handles an invalid-slot too in an extra-pass of evictable map…
> for(ShortCircuitReplica replica:evictable.values()) {
>  if(!scr.getSlot().isValid()) {
> purge(replica);
>  }
> }
> //Existing code...
> int numDemoted = demoteOldEvictableMmaped(curMs);
> int numPurged = 0;
> Long evictionTimeNs = Long.valueOf(0);
> ….
> …..
> Apps like HBase can tweak the expiry/staleness/cache-size params in 
> DFS-Client, so that ShortCircuitReplica will never be closed except when Slot 
> is declared invalid. 
> I assume slot-invalidation will happen during block-invalidation/deletes 
> {Primarily triggered by compaction/shard-takeover etc..}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files

2015-08-14 Thread kanaka kumar avvaru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanaka kumar avvaru updated HDFS-6955:
--
Attachment: HDFS-6955-01.patch

> DN should reserve disk space for a full block when creating tmp files
> -
>
> Key: HDFS-6955
> URL: https://issues.apache.org/jira/browse/HDFS-6955
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Arpit Agarwal
>Assignee: kanaka kumar avvaru
> Attachments: HDFS-6955-01.patch
>
>
> HDFS-6898 is introducing disk space reservation for RBW files to avoid 
> running out of disk space midway through block creation.
> This Jira is to introduce similar reservation for tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696880#comment-14696880
 ] 

Vinayakumar B commented on HDFS-7285:
-

I have tried to rebase current {{HDFS-7285}} branch against the current trunk 
using {{git rebase trunk}}. It was not smooth as expected. Since I did not 
wanted to push the rebase directly onto {{HDFS-7285}}, created one more branch 
{{HDFS-7285-REBASE}}. This branch is just for reference purpose.

The advantage of this is, it retained all the commits along with message,date 
and author details, even after resolving conflicts. I skipped one commit 
purposefully HDFS-8787 to be in sync with trunk. it was just rename of files. 
other than this, no other commits got squashed.

There were 192 commits to be rebased against trunk, including the intermediate 
merge conflict resolved commits. Since I couldnt edit each and every commit to 
resolve compilation errors after each commit, resolved remaining compilation 
errors at the end, with one more commit.

If anyone wants to verify, please checkout the branch HDFS-7285-REBASE. and can 
compare against the Consolidated patch.

Since this is only for trying to check the possibility of rebase, I am not 
saying this should be considered as final branch.
If everyone feels good to go like this approach, I could do some more detailed 
rebase next week, ( may be verify the compilation after each commit ? Not sure 
whether its possible to stop for each commit rebase)

-Thanks

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: Consolidated-20150707.patch, 
> Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, 
> ECParser.py, HDFS-7285-initial-PoC.patch, 
> HDFS-7285-merge-consolidated-01.patch, 
> HDFS-7285-merge-consolidated-trunk-01.patch, 
> HDFS-7285-merge-consolidated.trunk.03.patch, 
> HDFS-7285-merge-consolidated.trunk.04.patch, 
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
> HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
> HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
> fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8838:

Attachment: HDFS-8838-HDFS-7285-20150809-test.patch

A test patch HDFS-8838-HDFS-7285-20150809-test.patch to trigger Jenkins to test 
if HDFS-8896 works.

> Tolerate datanode failures in DFSStripedOutputStream when the data length is 
> small
> --
>
> Key: HDFS-8838
> URL: https://issues.apache.org/jira/browse/HDFS-8838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-8838-HDFS-7285-000.patch, 
> HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
> h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
> h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch
>
>
> Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
> data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696860#comment-14696860
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8305 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8305/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager

2015-08-14 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8896:

Status: Patch Available  (was: Open)

> DataNode object isn't GCed when shutdown, because it has GC root in 
> ShutdownHookManager
> ---
>
> Key: HDFS-8896
> URL: https://issues.apache.org/jira/browse/HDFS-8896
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG
>
>
> The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC 
> root.
> screenshot_1 shows how DN object be traced to the GC root.
> It's not a problem in production.
> It's a problem in test, especially when MiniDFSCluster starts/shutdowns many 
> DNs, which could cause {{OutOfMemoryError}}.
> screenshot_2 shows many DN objects are not GCed when run the test of 
> HDFS-8838.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager

2015-08-14 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8896:

Attachment: HDFS-8896.01.patch

> DataNode object isn't GCed when shutdown, because it has GC root in 
> ShutdownHookManager
> ---
>
> Key: HDFS-8896
> URL: https://issues.apache.org/jira/browse/HDFS-8896
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG
>
>
> The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC 
> root.
> screenshot_1 shows how DN object be traced to the GC root.
> It's not a problem in production.
> It's a problem in test, especially when MiniDFSCluster starts/shutdowns many 
> DNs, which could cause {{OutOfMemoryError}}.
> screenshot_2 shows many DN objects are not GCed when run the test of 
> HDFS-8838.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-8270:

Fix Version/s: 2.6.1

> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-8270:

Labels:   (was: 2.6.1-candidate)

> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696854#comment-14696854
 ] 

Vinayakumar B commented on HDFS-8270:
-

Cherry-picked to 2.6.1

> create() always retried with hardcoded timeout when file already exists with 
> open lease
> ---
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Andrey Stepachev
>Assignee: J.Andreina
> Fix For: 2.6.1, 2.7.1
>
> Attachments: HDFS-8270-branch-2.6-v3.patch, 
> HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
> HDFS-8270.3.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696836#comment-14696836
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696833#comment-14696833
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> processIncrementalBlockReport performance degradation
> -
>
> Key: HDFS-7213
> URL: https://issues.apache.org/jira/browse/HDFS-7213
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Assignee: Eric Payne
>Priority: Critical
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt
>
>
> {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
> missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
> with the increase in incremental block reports from receiving blocks, under 
> heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696841#comment-14696841
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Multihoming docs should emphasize using hostnames in configurations
> ---
>
> Key: HDFS-7649
> URL: https://issues.apache.org/jira/browse/HDFS-7649
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arpit Agarwal
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-7649.patch
>
>
> The docs should emphasize that master and slave configurations should 
> hostnames wherever possible.
> Link to current docs: 
> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >