[jira] [Commented] (HDFS-7299) Hadoop Namenode failing because of negative value in fsimage

2014-10-29 Thread Vishnu Ganth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189717#comment-14189717
 ] 

Vishnu Ganth commented on HDFS-7299:


I was able to bring the namenode up, by commenting the following lines from 
org.apache.hadoop.hdfs.protocol.Block.java
if (numBytes < 0) {
 throw new IOException("Unexpected block size: " + numBytes);
}

But not sure how NUM_BYTES got negative value in fsimage.

[~huLiu]

> Hadoop Namenode failing because of negative value in fsimage
> 
>
> Key: HDFS-7299
> URL: https://issues.apache.org/jira/browse/HDFS-7299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Vishnu Ganth
>
> Hadoop Namenode is getting failed because of some unexpected value of block 
> size in fsimage.
> Stack trace:
> {code}
> 2014-10-27 16:22:12,107 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> STARTUP_MSG: 
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = /
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 2.0.0-cdh4.4.0
> STARTUP_MSG:   classpath = 
> /var/run/cloudera-scm-agent/process/12726-hdfs-NAMENODE:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/hue-plugins-2.5.0-cdh4.4.0.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/activation-1.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jersey-core-1.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jackson-xc-1.8.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/guava-11.0.2.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-collections-3.2.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/paranamer-2.3.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-net-3.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/xz-1.0.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-cli-1.2.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/zookeeper-3.4.5-cdh4.4.0.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/slf4j-api-1.6.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/stax-api-1.0.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jline-0.9.94.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jsp-api-2.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jsr305-1.3.9.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-logging-1.1.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-digester-1.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/xmlenc-0.52.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/log4j-1.2.17.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jersey-server-1.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/servlet-api-2.5.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jettison-1.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-httpclient-3.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-math-2.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jets3t-0.6.1.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-el-1.0.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/avro-1.7.4.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-codec-1.4.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/commons-lang-2.5.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/jersey-json-1.8.jar:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/kfs-0.3

[jira] [Commented] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189702#comment-14189702
 ] 

Hadoop QA commented on HDFS-6385:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678111/HDFS-6385.2.patch
  against trunk revision 0126cf1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8596//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8596//console

This message is automatically generated.

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.2.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189703#comment-14189703
 ] 

Hadoop QA commented on HDFS-7035:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678104/HDFS-7035.014.patch
  against trunk revision 2a6be65.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1265 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestAllowFormat
  
org.apache.hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens
  org.apache.hadoop.hdfs.server.datanode.TestRefreshNamenodes
  org.apache.hadoop.hdfs.TestEncryptedTransfer
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotMetrics
  org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream
  org.apache.hadoop.hdfs.TestSnapshotCommands
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshottableDirListing
  org.apache.hadoop.hdfs.TestRead
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots
  org.apache.hadoop.hdfs.TestBlocksScheduledCounter
  
org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate
  org.apache.hadoop.hdfs.TestDFSPermission
  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.server.namenode.TestStartup
  
org.apache.hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithXAttr
  org.apache.hadoop.hdfs.TestDFSClientFailover
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation
  
org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks
  
org.apache.hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality
  org.apache.hadoop.hdfs.TestLeaseRecovery2
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithAcl
  org.apache.hadoop.hdfs.TestWriteConfigurationToDFS
  org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
  
org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCacheRevocation
  org.apache.hadoop.hdfs.web.TestHttpsFileSystem
  org.apache.hadoop.hdfs.server.namenode.TestFSDirectory
  
org.apache.hadoop.hdfs.server.datanode.TestIncrementalBlockReports
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles
  org.apache.hadoop.hdfs.TestDFSOutputStream
  org.apache.hadoop.hdfs.TestSetTimes
  
org.apache.hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling
  
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.TestDatanodeDeath
  org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode
  org.apache.hadoop.hdfs.TestDFSRollback
  org.apache.hadoop.hdfs.TestClientBlockVerification
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
  org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache
  org.apache.hadoop.hdfs.server.datanode.TestDataNodeExit
  org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache
  
org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN
  org.apache.hadoop.hdfs.TestFileCreationE

[jira] [Commented] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189650#comment-14189650
 ] 

Jing Zhao commented on HDFS-6385:
-

+1

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.2.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189630#comment-14189630
 ] 

Haohui Mai commented on HDFS-6385:
--

+1

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.2.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2014-10-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189574#comment-14189574
 ] 

Konstantin Shvachko edited comment on HDFS-7263 at 10/30/14 3:40 AM:
-

I just committed this. Thank you Tao.


was (Author: shv):
I jsut committed this. Thank you Tao.

> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189577#comment-14189577
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6391 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6391/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (shv: rev 0126cf16b73843da2e504b6a03fee8bd93a404d5)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotFileLength.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2014-10-29 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7263:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

I jsut committed this. Thank you Tao.

> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Fix For: 2.7.0
>
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6385:

Attachment: HDFS-6385.2.patch

Here is patch v2.  This fixes the test by parsing the return value of 
{{FSNamesystem#getNNStarted}} to determine start time.

[~jingzhao], are you still +1 for this version of the patch?

bq. I wonder, why the information needs to be exported on both metrics and JMX?

Thanks for reviewing, Haohui.  I was aiming for consistency with 
PendingDeletionBlocks, but it's not really necessary.  I removed the {{Metric}} 
annotation in this version of the patch.

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.2.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2014-10-29 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7263:
--
Summary: Snapshot read can reveal future bytes for appended files.  (was: 
Snapshot read of an appended file returns more bytes than the file length.)

+1

> Snapshot read can reveal future bytes for appended files.
> -
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189554#comment-14189554
 ] 

Hadoop QA commented on HDFS-7276:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678065/h7276_20141029b.patch
  against trunk revision 6f5f604.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController
  org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8593//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8593//console

This message is automatically generated.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch, h7276_20141029.patch, 
> h7276_20141029b.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189553#comment-14189553
 ] 

Hadoop QA commented on HDFS-7035:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678067/HDFS-7035.013.patch
  against trunk revision 6f5f604.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1272 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8594//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8594//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8594//console

This message is automatically generated.

> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch, HDFS-7035.013.patch, 
> HDFS-7035.014.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3

2014-10-29 Thread Zhanwei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189546#comment-14189546
 ] 

Zhanwei Wang commented on HDFS-7017:


Hi [~wheat9] and [~cmccabe]

Any comments on this patch?

> Implement OutputStream for libhdfs3
> ---
>
> Key: HDFS-7017
> URL: https://issues.apache.org/jira/browse/HDFS-7017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-7017-pnative.002.patch, HDFS-7017.patch
>
>
> Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7035:

Attachment: HDFS-7035.014.patch

[~cmccabe] I made changes based on your comments. Thanks!

bq. this isn't needed because VolumeBuilder is in the same Java package as 
Storage.

This function is {{BlockPoolSliceStorage#addStorageDir}}, which is called by 
{{DataStorage#VolumeBuilder}}. I could not use {{protected}} or project 
visibility here. 

> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch, HDFS-7035.013.patch, 
> HDFS-7035.014.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7173) Only keep successfully loaded volumes in the configuration.

2014-10-29 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7173:

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

The changes have been merged to HDFS-7035 and been reviewed there.

> Only keep successfully loaded volumes in the configuration.
> ---
>
> Key: HDFS-7173
> URL: https://issues.apache.org/jira/browse/HDFS-7173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0, 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7173.000.combo.patch, HDFS-7173.000.patch, 
> HDFS-7173.001.combo.patch, HDFS-7173.001.patch, HDFS-7173.002.combo.patch, 
> HDFS-7173.002.patch, HDFS-7173.003.combo.patch, HDFS-7173.003.patch
>
>
> Hot swapping data volumes might fail. The user should be able to fix the 
> failed volumes and disks, then ask the {{DataNode}} to retry the previously 
> failed volumes. 
> To attempt to reload the failed volume again on the same directory, this 
> failed directory must not be presented in the {{Configuration}} object that 
> {{DataNode has}}. Therefore, it should only put successfully loaded volumes 
> into the {{Configuration}} object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read of an appended file returns more bytes than the file length.

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189493#comment-14189493
 ] 

Hadoop QA commented on HDFS-7263:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678051/HDFS-7263.patch
  against trunk revision 3ae84e1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8592//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8592//console

This message is automatically generated.

> Snapshot read of an appended file returns more bytes than the file length.
> --
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189469#comment-14189469
 ] 

Colin Patrick McCabe commented on HDFS-7035:


{{Storage.java}}: remove unnecessary whitespace change

{code}
129   // Expose visibility for VolumeBuilder#commit().
130   public void addStorageDir(StorageDirectory sd) {
131 super.addStorageDir(sd);
132   }
{code}

This isn't needed because VolumeBuilder is in the same Java package as Storage, 
so it can just call the parent method directly.

{code}
/** The unchanged locations that exist in the old configuration. */
{code}
Should be "existed in the old configuration"

{code}
  builder.addBpStorageDirtectories(
{code}
Should be "directories" not "dirtectories"

{code}
167   // 2. Do transitions
168   // Each storage directory is treated individually.
169   // During startup some of them can upgrade or roll back
170   // while others could be up-to-date for the regular startup.
171   doTransition(datanode, sd, nsInfo, startOpt);
172   assert getCTime() == nsInfo.getCTime()
173   : "Data-node and name-node CTimes must be the same.";
{code}
This should be throwing an IOE, not an assert.  Otherwise we're bringing down 
the DataNode because someone tried to add a storage directory that wasn't 
valid... not good.

{code}
327   // bpStorage does not add loaded volume immediately. The volume 
will be
328   // added when calling builder.build() later. However, several
329   // members (e.g., Storage#layoutVersion, Storage#cTime will be 
updated
330   // in BlockPoolSliceStorage#format() and
331   // BlockPoolSliceStorage#loadStorageDirectory. But since these 
values are
332   // considered constant during the DataNode execution, we do not 
revert the
333   // the changes on such members.
{code}
I think this comment belongs in the JavaDoc for the function.  I also feel like 
the current form of the comment is somewhat confusing.  I would say something 
like "prepareVolume creates a builder which can be used to add to the volume.  
If the volume cannot be added, it is OK to discard the builder later."

removeVolumes: can you document in the JavaDoc for this function that even when 
the IOE is thrown, the volumes are still removed?

+1 once these are addressed.

> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch, HDFS-7035.013.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189431#comment-14189431
 ] 

Hadoop QA commented on HDFS-7276:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678023/h7276_20141029.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.util.TestByteArrayManager

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8591//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8591//console

This message is automatically generated.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch, h7276_20141029.patch, 
> h7276_20141029b.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189430#comment-14189430
 ] 

Vinod Kumar Vavilapalli commented on HDFS-7295:
---

bq. Vinod, We're probably not on the same wavelength. I agree with all that you 
said about keytabs being the solution for services. But I'm trying to find a 
solution for apps that are started by regular users. There are no keytabs here.
We are. I am saying that the services we are bringing to YARN are the same 
services the existed today outside of YARN. And they have keytabs.

I a not sure how Spark Streaming works today in a secure cluster outside of 
YARN without any access to keytabs.

> Support arbitrary max expiration times for delegation token
> ---
>
> Key: HDFS-7295
> URL: https://issues.apache.org/jira/browse/HDFS-7295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. 
> This is a problem for different users of HDFS such as long running YARN apps. 
> Users should be allowed to optionally specify max lifetime for their tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189421#comment-14189421
 ] 

Hadoop QA commented on HDFS-7035:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678017/HDFS-7035.012.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8590//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8590//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8590//console

This message is automatically generated.

> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch, HDFS-7035.013.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4882) Namenode LeaseManager checkLeases() runs into infinite loop

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189396#comment-14189396
 ] 

Hadoop QA commented on HDFS-4882:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12586700/4882.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.TestDeleteRace

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8589//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8589//console

This message is automatically generated.

> Namenode LeaseManager checkLeases() runs into infinite loop
> ---
>
> Key: HDFS-4882
> URL: https://issues.apache.org/jira/browse/HDFS-4882
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Zesheng Wu
> Attachments: 4882.1.patch, 4882.patch, 4882.patch
>
>
> Scenario:
> 1. cluster with 4 DNs
> 2. the size of the file to be written is a little more than one block
> 3. write the first block to 3 DNs, DN1->DN2->DN3
> 4. all the data packets of first block is successfully acked and the client 
> sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
> 5. DN2 and DN3 are down
> 6. client recovers the pipeline, but no new DN is added to the pipeline 
> because of the current pipeline stage is PIPELINE_CLOSE
> 7. client continuously writes the last block, and try to close the file after 
> written all the data
> 8. NN finds that the penultimate block doesn't has enough replica(our 
> dfs.namenode.replication.min=2), and the client's close runs into indefinite 
> loop(HDFS-2936), and at the same time, NN makes the last block's state to 
> COMPLETE
> 9. shutdown the client
> 10. the file's lease exceeds hard limit
> 11. LeaseManager realizes that and begin to do lease recovery by call 
> fsnamesystem.internalReleaseLease()
> 12. but the last block's state is COMPLETE, and this triggers lease manager's 
> infinite loop and prints massive logs like this:
> {noformat}
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
>  limit
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
> /user/h_wuzesheng/test.dat
> 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
> blk_-7028017402720175688_1202597,
> lastBLockState=COMPLETE
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
> for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
> APREDUCE_-1252656407_1, pendingcreates: 1]
> {noformat}
> (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Failed DataNode lookup can crash NameNode with NullPointerException

2014-10-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189369#comment-14189369
 ] 

Zhe Zhang commented on HDFS-7225:
-

Thanks [~jingzhao] for the advice. If that's the case we should indeed remove 
the block invalidation tasks once a new storage UUID has been discovered. I'll 
submit an updated patch.

> Failed DataNode lookup can crash NameNode with NullPointerException
> ---
>
> Key: HDFS-7225
> URL: https://issues.apache.org/jira/browse/HDFS-7225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch
>
>
> {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
> {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
> {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
> {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
> which will use it to lookup in a {{TreeMap}}. Since the key type is 
> {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
> will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2014-10-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189366#comment-14189366
 ] 

Zhe Zhang commented on HDFS-7285:
-

A meeting has been scheduled:
* When: Friday Oct. 31st 10am~12pm
* Where: Cloudera Headquarter, 1001 Page Mill Road, Palo Alto. Both the lobby 
(for guests check-in) and the meeting room (Hadoop) are in building #2
* URL: 
https://cloudera.webex.com/cloudera/j.php?MTID=me26394d0a3559c7a9498f18ad7de8962
* Call-in: 1-650-479-3208 (US/Canada) with access code: 290 472 605. 

Please drop me a note (zhezh...@cloudera.com) if you prefer a different time.

Thanks [~drankye] for the suggestion. The interface of the erasure coding 
feature potentially has a close relationship with HSM (HDFS-2832) and archival 
storage (HDFS-6584). We'll make sure to cover this topic in the meeting and 
share the summary here.

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Attachments: HDFSErasureCodingDesign-20141028.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7281) Missing block is marked as corrupted block

2014-10-29 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7281:

Labels: supportability  (was: )

> Missing block is marked as corrupted block
> --
>
> Key: HDFS-7281
> URL: https://issues.apache.org/jira/browse/HDFS-7281
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>  Labels: supportability
> Attachments: HDFS-7281-2.patch, HDFS-7281.patch
>
>
> In the situation where the block lost all its replicas, fsck shows the block 
> is missing as well as corrupted. Perhaps it is better not to mark the block 
> corrupted in this case. The reason it is marked as corrupted is 
> numCorruptNodes == numNodes == 0 in the following code.
> {noformat}
> BlockManager
> final boolean isCorrupt = numCorruptNodes == numNodes;
> {noformat}
> Would like to clarify if it is the intent to mark missing block as corrupted 
> or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block

2014-10-29 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189343#comment-14189343
 ] 

Yongjun Zhang commented on HDFS-7281:
-

Thanks [~mingma]. 

Hi [~atm], the latest patch looks good to me. I wonder if you would have time 
to do a review here? thanks.


> Missing block is marked as corrupted block
> --
>
> Key: HDFS-7281
> URL: https://issues.apache.org/jira/browse/HDFS-7281
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7281-2.patch, HDFS-7281.patch
>
>
> In the situation where the block lost all its replicas, fsck shows the block 
> is missing as well as corrupted. Perhaps it is better not to mark the block 
> corrupted in this case. The reason it is marked as corrupted is 
> numCorruptNodes == numNodes == 0 in the following code.
> {noformat}
> BlockManager
> final boolean isCorrupt = numCorruptNodes == numNodes;
> {noformat}
> Would like to clarify if it is the intent to mark missing block as corrupted 
> or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189339#comment-14189339
 ] 

Hadoop QA commented on HDFS-7199:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677996/HDFS-7199.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8588//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8588//console

This message is automatically generated.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7035:

Attachment: HDFS-7035.013.patch

Hi, [~cmccabe] Thanks for your quick reviews.

I have updates the patch based on your comments. {{conf}} is only  updated in 
the end of {{refreshVolumes}}.

bq.  I am also curious what happens when we fail midway through. I can see that 
DataStorage#prepareVolume adds the volume to DataStorage#bpStorageMap, is there 
anywhere where we remove it if the addition fails?

I have also added comments here. Basically, {{refreshVolume}} does not change 
the states of {{BlockPoolSliceStorage}} except updating a few member with 
constant values. 

Would you give another look? Thanks!

> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch, HDFS-7035.013.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-29 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7276:
--
Attachment: h7276_20141029b.patch

h7276_20141029b.patch: reverts PacketHeader and slightly changes some javadoc 
and error messages.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch, h7276_20141029.patch, 
> h7276_20141029b.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Failed DataNode lookup can crash NameNode with NullPointerException

2014-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189314#comment-14189314
 ] 

Jing Zhao commented on HDFS-7225:
-

Currently if a reported block belongs to no file, the block will be finally 
marked as invalid (not for the first block report though) and will be finally 
deleted.

> Failed DataNode lookup can crash NameNode with NullPointerException
> ---
>
> Key: HDFS-7225
> URL: https://issues.apache.org/jira/browse/HDFS-7225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch
>
>
> {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
> {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
> {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
> {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
> which will use it to lookup in a {{TreeMap}}. Since the key type is 
> {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
> will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189286#comment-14189286
 ] 

Hudson commented on HDFS-7305:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6388 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6388/])
HDFS-7305. NPE seen in wbhdfs FS while running SLive. Contributed by Jing Zhao. 
(jing9: rev 6f5f604a798b545faf6fadc9b66c8a8995b354db)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7305:

Component/s: webhdfs

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7305:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Arpit for the report and Haohui for the review. I've committed this to 
trunk, branch-2 and branch-2.6.

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Failed DataNode lookup can crash NameNode with NullPointerException

2014-10-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189281#comment-14189281
 ] 

Zhe Zhang commented on HDFS-7225:
-

AFAICT, NN won't try to delete orphan blocks. I verified with the following 
test:

{code}
  public void testOrphanBlocks() throws IOException {
DataNode dn = cluster.getDataNodes().get(0);
DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid);
StorageBlockReport reports[] =
new StorageBlockReport[cluster.getStoragesPerDatanode()];

ArrayList blocks = new ArrayList();

for (int i = 0; i < 10; i++) {
  blocks.add(new Block());
}
for (int i = 0; i < cluster.getStoragesPerDatanode(); ++i) {
  BlockListAsLongs bll = new BlockListAsLongs(blocks, null);
  FsVolumeSpi v = dn.getFSDataset().getVolumes().get(i);
  DatanodeStorage dns = new DatanodeStorage(v.getStorageID());
  reports[i] = new StorageBlockReport(dns, bll.getBlockListAsLongs());
}
cluster.getNameNodeRpc().blockReport(dnReg, bpid, reports);
LOG.debug("Scheduling to delete " +
cluster.getNameNode().getNamesystem().getBlockManager().
getPendingDeletionBlocksCount() + " blocks");
  }
{code}

I wonder if it's the intended behavior for the NN to keep orphan blocks, or we 
should add the logic to delete them. [~andrew.wang] Do you have a clue?

> Failed DataNode lookup can crash NameNode with NullPointerException
> ---
>
> Key: HDFS-7225
> URL: https://issues.apache.org/jira/browse/HDFS-7225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch
>
>
> {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
> {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
> {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
> {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
> which will use it to lookup in a {{TreeMap}}. Since the key type is 
> {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
> will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block

2014-10-29 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189241#comment-14189241
 ] 

Ming Ma commented on HDFS-7281:
---

Thanks, Yongjun. HADOOP-11045 is useful. Both TestEncryptionZonesWithHA and 
TestLeaseRecovery2 pass with the local run.

> Missing block is marked as corrupted block
> --
>
> Key: HDFS-7281
> URL: https://issues.apache.org/jira/browse/HDFS-7281
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7281-2.patch, HDFS-7281.patch
>
>
> In the situation where the block lost all its replicas, fsck shows the block 
> is missing as well as corrupted. Perhaps it is better not to mark the block 
> corrupted in this case. The reason it is marked as corrupted is 
> numCorruptNodes == numNodes == 0 in the following code.
> {noformat}
> BlockManager
> final boolean isCorrupt = numCorruptNodes == numNodes;
> {noformat}
> Would like to clarify if it is the intent to mark missing block as corrupted 
> or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189243#comment-14189243
 ] 

Hadoop QA commented on HDFS-7305:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677978/HDFS-7305.000.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8587//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8587//console

This message is automatically generated.

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7263) Snapshot read of an appended file returns more bytes than the file length.

2014-10-29 Thread Tao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-7263:
--
Status: Patch Available  (was: In Progress)

> Snapshot read of an appended file returns more bytes than the file length.
> --
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7263) Snapshot read of an appended file returns more bytes than the file length.

2014-10-29 Thread Tao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-7263:
--
Attachment: HDFS-7263.patch

Simplified the test per Konstantin's review.

> Snapshot read of an appended file returns more bytes than the file length.
> --
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
> TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189217#comment-14189217
 ] 

Haohui Mai commented on HDFS-6385:
--

I wonder, why the information needs to be exported on both metrics and JMX?

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189216#comment-14189216
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Yes, good point, Nicholas.
During upgrade old blocks are hard linked and the hard links are stored in a 
separate directory, which allows the blocks themselves be deleted while the 
file system is updated. So when we roll back the deleted blocks are still 
available via those hard links.
With truncate the same is applied to the blocks that were deleted. For the 
blocks that are truncated to a smaller length we will need to do 
copy-on-truncate recovery, same as we do for snapshots. That is if upgrade is 
in progress NN will schedule copy-on-truncate wether snapshots are present or 
not.
The bottom line is that implementing copy-on-truncate is needed both for 
snapshots and upgrades. I'll make a note to update the design.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read of an appended file returns more bytes than the file length.

2014-10-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189193#comment-14189193
 ] 

Konstantin Shvachko commented on HDFS-7263:
---

I coudn't see the test failures or patch related java warnings locally with or 
without the patch.
I have one nit. In the test you added for append you create a new snapshot2. 
This is not necessary for this particulare test case.
Could you also add a comment that you are testing snapshot read of a file 
opened for append. 

> Snapshot read of an appended file returns more bytes than the file length.
> --
>
> Key: HDFS-7263
> URL: https://issues.apache.org/jira/browse/HDFS-7263
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Tao Luo
> Attachments: HDFS-7263.patch, HDFS-7263.patch, TestSnapshotRead.java
>
>
> The following sequence of steps will produce extra bytes, that should not be 
> visible, because they are not in the snapshot.
> * Create a file of size L, where {{L % blockSize != 0}}.
> * Create a snapshot
> * Append bytes to the file
> * Read file in the snapshot (not the current file)
> * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189186#comment-14189186
 ] 

Jing Zhao commented on HDFS-7276:
-

Thanks for updating the patch, Nicholas! The new patch looks pretty good to me. 
Maybe we can remove the change in PacketHeader? Other than that +1.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch, h7276_20141029.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block

2014-10-29 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189175#comment-14189175
 ] 

Yongjun Zhang commented on HDFS-7281:
-

HI [~mingma],

Thanks for addressing my comments, the change looks good to me.

About the test failure, I used the tool from HADOOP-11045 and found the 
following:
{code}
Recently FAILED builds in url: 
https://builds.apache.org//job/PreCommit-Hdfs-Build
THERE ARE 95 builds (out of 100) that have failed tests in the past 7 days, 
as listed below:
..
Among 100 runs examined, all failed tests <#failedRuns: testName>:
6: org.apache.hadoop.hdfs.TestLeaseRecovery2.testLeaseRecoverByAnotherUser
6: org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecovery
6: 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart
5: org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode
5: org.apache.hadoop.hdfs.TestLeaseRecovery2.testThreadName
3: org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
...
{code}
So the TestLeaseReovery2 is not relevant to your change as we expected.

I suggest that you run locally both the this test and the timeouted one 
TestEncryptionZonesWithHA and see if they pass with your patch, for 
completeness.

Thanks.


> Missing block is marked as corrupted block
> --
>
> Key: HDFS-7281
> URL: https://issues.apache.org/jira/browse/HDFS-7281
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7281-2.patch, HDFS-7281.patch
>
>
> In the situation where the block lost all its replicas, fsck shows the block 
> is missing as well as corrupted. Perhaps it is better not to mark the block 
> corrupted in this case. The reason it is marked as corrupted is 
> numCorruptNodes == numNodes == 0 in the following code.
> {noformat}
> BlockManager
> final boolean isCorrupt = numCorruptNodes == numNodes;
> {noformat}
> Would like to clarify if it is the intent to mark missing block as corrupted 
> or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189165#comment-14189165
 ] 

Colin Patrick McCabe commented on HDFS-7035:


P.S.  I like the VolumeBuilder concept.  Can you add some JavaDoc about the 
usage?  I am also curious what happens when we fail midway through.  I can see 
that {{DataStorage#prepareVolume}} adds the volume to 
{{DataStorage#bpStorageMap}}, is there anywhere where we remove it if the 
addition fails?  Perhaps we need an {{abort}} function in the Builder?

> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189160#comment-14189160
 ] 

Rushabh S Shah commented on HDFS-7199:
--

Test TestDFSStorageStateRecovery running without any errors on my local cluster.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189157#comment-14189157
 ] 

Hudson commented on HDFS-7300:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6387 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6387/])
HDFS-7300. HDFS-7300. The getMaxNodesPerRack() method in (kihwal: rev 
3ae84e1ba8928879b3eda90e79667ba5a45d60f8)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppendRestart.java


> The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
> 
>
> Key: HDFS-7300
> URL: https://issues.apache.org/jira/browse/HDFS-7300
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: HDFS-7300.patch, HDFS-7300.v2.patch
>
>
> The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases.
> - Three replicas on two racks. The max is 3, so everything can go to one rack.
> - Two replicas on two or more racks. The max is 2, both replicas can end up 
> in the same rack.
> {{BlockManager#isNeededReplication()}} fixes this after block/file is closed 
> because {{blockHasEnoughRacks()}} will return fail.  This is not only extra 
> work, but also can break the favored nodes feature.
> When there are two racks and two favored nodes are specified in the same 
> rack, NN may allocate the third replica on a node in the same rack, because 
> {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the 
> other rack. There is 66% chance that a favored node is moved.  If 
> {{maxNodesPerRack}} was 2, this would not happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189148#comment-14189148
 ] 

Colin Patrick McCabe commented on HDFS-7035:


This looks good.  Thanks, Eddy.  Comments below.

{code}
  private synchronized void refreshVolumes(String newVolumes) throws 
IOException {
  conf.set(DFS_DATANODE_DATA_DIR_KEY, newVolumes);
{code}
What's the purpose of setting this at the beginning of the function?  At the 
end of the function we set it to the actual volumes that got added (plus the 
existing).  It seems like it only needs to be set once?

Also, we should add some JavaDoc stating that even if an IOException is thrown 
from this function, some new volumes may have been successfully added.

{code}
LOG.info("Analyzed volume - " + dir + ", StorageType: " + storageType);
{code}
Should say "added volume"?  Since this is at the end of addVolume.

{{updateReplicaUnderRecovery}}: can we avoid changing the whitespace here?  
It's distracting

{{SimulatedFSDataset#addVolume}}: need an Override annotation here.  Findbugs 
or something will probably complain.

{code}
1115  public void write(StorageDirectory sd) throws IOException {
1116this.layoutVersion = getServiceLayoutVersion();
1117writeProperties(sd);
1118  }
{code}

I realize that you modelled this on {{Storage#writeAll}}.  But I find this to 
be a weird (and weirdly named) API.  It's a function named "write", that 
updates the layoutVersion?  And then writes just the properties file?

I think we should have an API named setServiceLayoutVersion, and then just do 
setServiceLayoutVersion(getServiceLayoutVersion()).  Better yet, rename 
getServiceLayoutVersion to getLatestServiceLayoutVersion, since that's really 
what it's doing.

Then we could just call:
{code}
storage.setServiceLayoutVersion(getLatestServiceLayoutVersion())
storage.writeProperties(sd)
{code}

and it would be obvious what was going on.

> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed

2014-10-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7300:
-
  Resolution: Fixed
   Fix Version/s: 2.6.0
Target Version/s: 2.6.0  (was: 2.7.0)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the review, Daryn. Committed to trunk, branch-2 and branch-2.6.

> The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
> 
>
> Key: HDFS-7300
> URL: https://issues.apache.org/jira/browse/HDFS-7300
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.6.0
>
> Attachments: HDFS-7300.patch, HDFS-7300.v2.patch
>
>
> The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases.
> - Three replicas on two racks. The max is 3, so everything can go to one rack.
> - Two replicas on two or more racks. The max is 2, both replicas can end up 
> in the same rack.
> {{BlockManager#isNeededReplication()}} fixes this after block/file is closed 
> because {{blockHasEnoughRacks()}} will return fail.  This is not only extra 
> work, but also can break the favored nodes feature.
> When there are two racks and two favored nodes are specified in the same 
> rack, NN may allocate the third replica on a node in the same rack, because 
> {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the 
> other rack. There is 66% chance that a favored node is moved.  If 
> {{maxNodesPerRack}} was 2, this would not happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189127#comment-14189127
 ] 

Chris Nauroth commented on HDFS-6385:
-

{quote}
-1 core tests. The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
{quote}

I think the problem is that my test was comparing a {{Time#now}} value (wrapper 
over {{System#currentTimeMillis}}) to a {{Time#monotonicNow}} value (wrapper 
over {{System#nanoTime}}).  Values returned from {{System#nanoTime}} can be 
negative though.  It passed locally on my system, but that was just 
coincidental.

I'll need to change my test code.

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7300) The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed

2014-10-29 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189122#comment-14189122
 ] 

Daryn Sharp commented on HDFS-7300:
---

+1 Looks good.

> The getMaxNodesPerRack() method in BlockPlacementPolicyDefault is flawed
> 
>
> Key: HDFS-7300
> URL: https://issues.apache.org/jira/browse/HDFS-7300
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-7300.patch, HDFS-7300.v2.patch
>
>
> The {{getMaxNodesPerRack()}} can produce an undesirable result in some cases.
> - Three replicas on two racks. The max is 3, so everything can go to one rack.
> - Two replicas on two or more racks. The max is 2, both replicas can end up 
> in the same rack.
> {{BlockManager#isNeededReplication()}} fixes this after block/file is closed 
> because {{blockHasEnoughRacks()}} will return fail.  This is not only extra 
> work, but also can break the favored nodes feature.
> When there are two racks and two favored nodes are specified in the same 
> rack, NN may allocate the third replica on a node in the same rack, because 
> {{maxNodesPerRack}} is 3. When closing the file, NN moves a block to the 
> other rack. There is 66% chance that a favored node is moved.  If 
> {{maxNodesPerRack}} was 2, this would not happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189114#comment-14189114
 ] 

Hadoop QA commented on HDFS-7199:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675017/HDFS-7199-WIP.patch
  against trunk revision 5c900b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
13 warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/8585//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDFSStorageStateRecovery

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8585//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8585//console

This message is automatically generated.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189113#comment-14189113
 ] 

Hadoop QA commented on HDFS-6385:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677969/HDFS-6385.1.patch
  against trunk revision c2575fb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDFSStorageStateRecovery

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8586//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8586//console

This message is automatically generated.

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-29 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7276:
--
Attachment: h7276_20141029.patch

h7276_20141029.patch: uses power of two array size and addresses Jing comments 
except #1.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch, h7276_20141029.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-29 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189109#comment-14189109
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7276:
---

Filed HDFS-7308 for fixing the computation.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7287) The OfflineImageViewer (OIV) can output invalid XML depending on the filename

2014-10-29 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189103#comment-14189103
 ] 

Ravi Prakash commented on HDFS-7287:


Thanks Colin! I've filed HDFS-7309 for clarification on the mangling. Please 
feel free to close as it as invalid if you think its unreasonable.

> The OfflineImageViewer (OIV) can output invalid XML depending on the filename
> -
>
> Key: HDFS-7287
> URL: https://issues.apache.org/jira/browse/HDFS-7287
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.6.0
>
> Attachments: HDFS-7287.1.patch, HDFS-7287.2.patch, HDFS-7287.patch, 
> testXMLOutput
>
>
> If the filename contains a character which is invalid in XML, 
> TextWriterImageVisitor.write() or PBImageXmlWriter.o() prints out the string 
> unescaped. For us this was the character 0x0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7035) Make adding volume an atomic operation.

2014-10-29 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7035:

Attachment: HDFS-7035.012.patch

[~cmccabe] Thanks for your detailed comments! 

In this updated patch, I have made the following major changes:

* Merge the changes from HDFS-7173 so that this patch can be more 
self-contained. 
* Remove {{StagedAddVolume}} interface and rename 
{{DataStorage#DataStorageAddedVolume}} to {{DataStorage#VolumeBuilder}}. 
* Removed {{FsDatasetImpl#StagedAddVolume}}
* Moved the logic of calling {{DataStorage#addVolume}} from 
{{DataNode#refreshVolume}} to {{FsDatasetImpl#addVolume}}.
* Change {{DataNode#refreshVolume}} to {{syncrhonized}} function so that there 
will be no starting-up / shutting-down DN activities. 
* Move {{DataNode#conf}} and {{DataNode#dataDir}} recovery into the {{finally}} 
segment. 



> Make adding volume an atomic operation.
> ---
>
> Key: HDFS-7035
> URL: https://issues.apache.org/jira/browse/HDFS-7035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch, HDFS-7035.012.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7309) XMLUtils.mangleXmlString doesn't seem to handle less than sign

2014-10-29 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-7309:
---
Attachment: HDFS-7309.patch

Here's a unit test to illustrate the problem 

> XMLUtils.mangleXmlString doesn't seem to handle less than sign
> --
>
> Key: HDFS-7309
> URL: https://issues.apache.org/jira/browse/HDFS-7309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Ravi Prakash
>Priority: Minor
> Attachments: HDFS-7309.patch
>
>
> My expectation was that "" + XMLUtils.mangleXmlString(
>   "Containing" would be a string 
> acceptable to a SAX parser. However this was not true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7308) DFSClient write packet size may > 64kB

2014-10-29 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189098#comment-14189098
 ] 

Yongjun Zhang commented on HDFS-7308:
-

Good catch Nicholas!


> DFSClient write packet size may > 64kB
> --
>
> Key: HDFS-7308
> URL: https://issues.apache.org/jira/browse/HDFS-7308
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>
> In DFSOutputStream.computePacketChunkSize(..),
> {code}
>   private void computePacketChunkSize(int psize, int csize) {
> final int chunkSize = csize + getChecksumSize();
> chunksPerPacket = Math.max(psize/chunkSize, 1);
> packetSize = chunkSize*chunksPerPacket;
> if (DFSClient.LOG.isDebugEnabled()) {
>   ...
> }
>   }
> {code}
> We have the following
> || variables || usual values ||
> | psize | dfsClient.getConf().writePacketSize = 64kB |
> | csize | bytesPerChecksum = 512B |
> | getChecksumSize(), i.e. CRC size | 32B |
> | chunkSize = csize + getChecksumSize() | 544B (not a power of two) |
> | psize/chunkSize | 120.47 |
> | chunksPerPacket = max(psize/chunkSize, 1) | 120 |
> | packetSize = chunkSize*chunksPerPacket (not including header) | 65280B |
> | PacketHeader.PKT_MAX_HEADER_LEN | 33B |
> | actual packet size | 65280 + 33 = *65313* < 65536 = 64k |
> It is fortunate that the usual packet size = 65313 < 64k although the 
> calculation above does not guarantee it always happens (e.g. if 
> PKT_MAX_HEADER_LEN=257, then actual packet size=65537 > 64k.)  We should fix 
> the computation in order to guarantee actual packet size < 64k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7309) XMLUtils.mangleXmlString doesn't seem to handle less than sign

2014-10-29 Thread Ravi Prakash (JIRA)
Ravi Prakash created HDFS-7309:
--

 Summary: XMLUtils.mangleXmlString doesn't seem to handle less than 
sign
 Key: HDFS-7309
 URL: https://issues.apache.org/jira/browse/HDFS-7309
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Ravi Prakash
Priority: Minor


My expectation was that "" + XMLUtils.mangleXmlString(
  "Containing" would be a string 
acceptable to a SAX parser. However this was not true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-29 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189063#comment-14189063
 ] 

Konstantin Boudnik commented on HDFS-3107:
--

bq. I posted it as a demonstration. I think to make it more robust we would want
And it needs to be atomic e.g. not involving 5 RPC calls, otherwise recovery 
would be a nightmare. 

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7308) DFSClient write packet size may > 64kB

2014-10-29 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-7308:
-

 Summary: DFSClient write packet size may > 64kB
 Key: HDFS-7308
 URL: https://issues.apache.org/jira/browse/HDFS-7308
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


In DFSOutputStream.computePacketChunkSize(..),
{code}
  private void computePacketChunkSize(int psize, int csize) {
final int chunkSize = csize + getChecksumSize();
chunksPerPacket = Math.max(psize/chunkSize, 1);
packetSize = chunkSize*chunksPerPacket;
if (DFSClient.LOG.isDebugEnabled()) {
  ...
}
  }
{code}
We have the following
|| variables || usual values ||
| psize | dfsClient.getConf().writePacketSize = 64kB |
| csize | bytesPerChecksum = 512B |
| getChecksumSize(), i.e. CRC size | 32B |
| chunkSize = csize + getChecksumSize() | 544B (not a power of two) |
| psize/chunkSize | 120.47 |
| chunksPerPacket = max(psize/chunkSize, 1) | 120 |
| packetSize = chunkSize*chunksPerPacket (not including header) | 65280B |
| PacketHeader.PKT_MAX_HEADER_LEN | 33B |
| actual packet size | 65280 + 33 = *65313* < 65536 = 64k |
It is fortunate that the usual packet size = 65313 < 64k although the 
calculation above does not guarantee it always happens (e.g. if 
PKT_MAX_HEADER_LEN=257, then actual packet size=65537 > 64k.)  We should fix 
the computation in order to guarantee actual packet size < 64k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7308) DFSClient write packet size may > 64kB

2014-10-29 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7308:
--
Component/s: hdfs-client

> DFSClient write packet size may > 64kB
> --
>
> Key: HDFS-7308
> URL: https://issues.apache.org/jira/browse/HDFS-7308
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>
> In DFSOutputStream.computePacketChunkSize(..),
> {code}
>   private void computePacketChunkSize(int psize, int csize) {
> final int chunkSize = csize + getChecksumSize();
> chunksPerPacket = Math.max(psize/chunkSize, 1);
> packetSize = chunkSize*chunksPerPacket;
> if (DFSClient.LOG.isDebugEnabled()) {
>   ...
> }
>   }
> {code}
> We have the following
> || variables || usual values ||
> | psize | dfsClient.getConf().writePacketSize = 64kB |
> | csize | bytesPerChecksum = 512B |
> | getChecksumSize(), i.e. CRC size | 32B |
> | chunkSize = csize + getChecksumSize() | 544B (not a power of two) |
> | psize/chunkSize | 120.47 |
> | chunksPerPacket = max(psize/chunkSize, 1) | 120 |
> | packetSize = chunkSize*chunksPerPacket (not including header) | 65280B |
> | PacketHeader.PKT_MAX_HEADER_LEN | 33B |
> | actual packet size | 65280 + 33 = *65313* < 65536 = 64k |
> It is fortunate that the usual packet size = 65313 < 64k although the 
> calculation above does not guarantee it always happens (e.g. if 
> PKT_MAX_HEADER_LEN=257, then actual packet size=65537 > 64k.)  We should fix 
> the computation in order to guarantee actual packet size < 64k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7307) Need 'force close'

2014-10-29 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7307:
--

 Summary: Need 'force close'
 Key: HDFS-7307
 URL: https://issues.apache.org/jira/browse/HDFS-7307
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer


Until HDFS-4882 and HDFS-7306 get real fixes, operations teams need a way to 
force close files.  DNs are essentially held hostage by broken clients that 
never close.  This situation will get worse as longer/permanently running jobs 
start increasing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

2014-10-29 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189037#comment-14189037
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7276:
---

> ... However, it is unfortunate that our full package size is 64k + hearder 
> length, which will round up to 128k.

I was wrong about the full package size.  In 
DFSOutputStream.computePacketChunkSize(..),
{code}
  private void computePacketChunkSize(int psize, int csize) {
final int chunkSize = csize + getChecksumSize();
chunksPerPacket = Math.max(psize/chunkSize, 1);
packetSize = chunkSize*chunksPerPacket;
if (DFSClient.LOG.isDebugEnabled()) {
  ...
}
  }
{code}
So we have the following
|| variables || usual values ||
| psize | dfsClient.getConf().writePacketSize = 64kB |
| csize | bytesPerChecksum = 512B |
| getChecksumSize(), i.e. CRC size | 32B |
| chunkSize = csize + getChecksumSize() | 544B (not a power of two) |
| psize/chunkSize | 120.47 |
| chunksPerPacket = max(psize/chunkSize, 1) | 120 |
| packetSize = chunkSize*chunksPerPacket (not including header) | 65280 |
| PacketHeader.PKT_MAX_HEADER_LEN | 33B |
| actual packet size | 65280 + 33 = *65313* < 65536 = 64k |
It is fortunate that the usual packetSize = 65313 < 64k although the 
calculation above does not guarantee it happen (e.g. if PKT_MAX_HEADER_LEN=257, 
then actual packet size=65537 > 64k.)  I will fix the computation in order to 
guarantee actual packet size < 64k.

> Limit the number of byte arrays used by DFSOutputStream
> ---
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7306) can't decommission w/under construction blocks

2014-10-29 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7306:
--

 Summary: can't decommission w/under construction blocks
 Key: HDFS-7306
 URL: https://issues.apache.org/jira/browse/HDFS-7306
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer


We need a way to decommission a node with open blocks.  Now that HDFS supports 
append, this should be do-able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7199:
-
Status: Patch Available  (was: Open)

Changed the patch to address Collin's comment.
Did the same manual testing as before.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7199:
-
Attachment: HDFS-7199.patch

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7199:
-
Status: Open  (was: Patch Available)

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token

2014-10-29 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188962#comment-14188962
 ] 

bc Wong commented on HDFS-7295:
---

Thanks, [~aw]. Services vs Apps is an argument that I can appreciate. There are 
sites running Spark Streaming as apps. So I'll have to check with them first.

> Support arbitrary max expiration times for delegation token
> ---
>
> Key: HDFS-7295
> URL: https://issues.apache.org/jira/browse/HDFS-7295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. 
> This is a problem for different users of HDFS such as long running YARN apps. 
> Users should be allowed to optionally specify max lifetime for their tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188946#comment-14188946
 ] 

Kihwal Lee commented on HDFS-7097:
--

The test case failed because : {{java.net.BindException: Port in use: 
localhost:40123}}  We are getting this sort of failures more often nowadays 
from precommit.
Both test cases pass when run on my machine.

> Allow block reports to be processed during checkpointing on standby name node
> -
>
> Key: HDFS-7097
> URL: https://issues.apache.org/jira/browse/HDFS-7097
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
> HDFS-7097.patch
>
>
> On a reasonably busy HDFS cluster, there are stream of creates, causing data 
> nodes to generate incremental block reports.  When a standby name node is 
> checkpointing, RPC handler threads trying to process a full or incremental 
> block report is blocked on the name system's {{fsLock}}, because the 
> checkpointer acquires the read lock on it.  This can create a serious problem 
> if the size of name space is big and checkpointing takes a long time.
> All available RPC handlers can be tied up very quickly. If you have 100 
> handlers, it only takes 34 file creates.  If a separate service RPC port is 
> not used, HA transition will have to wait in the call queue for minutes. Even 
> if a separate service RPC port is configured, hearbeats from datanodes will 
> be blocked. A standby NN  with a big name space can lose all data nodes after 
> checkpointing.  The rpc calls will also be retransmitted by data nodes many 
> times, filling up the call queue and potentially causing listen queue 
> overflow.
> Since block reports are not modifying any state that is being saved to 
> fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188909#comment-14188909
 ] 

Haohui Mai commented on HDFS-7305:
--

+1 pending jenkins

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7305:

Priority: Minor  (was: Critical)

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7305:

Status: Patch Available  (was: Open)

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-7305:
---

Assignee: Jing Zhao

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7305:

Attachment: HDFS-7305.000.patch

Simple patch to fix.

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Priority: Critical
> Attachments: HDFS-7305.000.patch
>
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188899#comment-14188899
 ] 

Jing Zhao commented on HDFS-7305:
-

Looks more like the NPE is caused by possible null message contained in the 
Exception:
{code}
if (re.getMessage().startsWith(
  SecurityUtil.FAILED_TO_GET_UGI_MSG_HEADER)) {
{code}

{code}
public static RemoteException toRemoteException(final Map json) {
final Map m = (Map)json.get(RemoteException.class.getSimpleName());
final String message = (String)m.get("message");
final String javaClassName = (String)m.get("javaClassName");
return new RemoteException(javaClassName, message);
  }
{code}

> NPE seen in wbhdfs FS while running SLive
> -
>
> Key: HDFS-7305
> URL: https://issues.apache.org/jira/browse/HDFS-7305
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Arpit Gupta
>Priority: Critical
>
> {code}
> 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task 
> status: "Failed at running due to java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
>   at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
>   at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
>   at 
> org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
>   at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> " truncated to max limit (512 characters)
> Activity
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188890#comment-14188890
 ] 

Jing Zhao commented on HDFS-6385:
-

Thanks Chris! The patch looks pretty good to me. +1 pending Jenkins.

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7305) NPE seen in wbhdfs FS while running SLive

2014-10-29 Thread Arpit Gupta (JIRA)
Arpit Gupta created HDFS-7305:
-

 Summary: NPE seen in wbhdfs FS while running SLive
 Key: HDFS-7305
 URL: https://issues.apache.org/jira/browse/HDFS-7305
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Arpit Gupta
Priority: Critical


{code}
2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task status: 
"Failed at running due to java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163)
at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80)
at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63)
at 
org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122)
at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
" truncated to max limit (512 characters)

Activity
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2014-10-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1416#comment-1416
 ] 

Colin Patrick McCabe commented on HDFS-3107:


Good point, Nicholas.

Note that the patch I posted does handle rollback correctly, since it never 
modifies any existing block files.

I posted it as a demonstration.  I think to make it more robust we would want 
to avoid having the client write out the last block and concat it, and instead 
have some other mechanism for duplicating + shortening the final block of the 
file-- possibly a new DN command similar to COPY_BLOCK, but taking a length 
argument.

> HDFS truncate
> -
>
> Key: HDFS-3107
> URL: https://issues.apache.org/jira/browse/HDFS-3107
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Lei Chang
>Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7287) The OfflineImageViewer (OIV) can output invalid XML depending on the filename

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188869#comment-14188869
 ] 

Hudson commented on HDFS-7287:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6386 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6386/])
HDFS-7287. The OfflineImageViewer (OIV) can output invalid XML depending on the 
filename (Ravi Prakash via Colin P. McCabe) (cmccabe: rev 
d33e07dc49e00db138921fb3aa52c4ef00510161)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/XmlImageVisitor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java


> The OfflineImageViewer (OIV) can output invalid XML depending on the filename
> -
>
> Key: HDFS-7287
> URL: https://issues.apache.org/jira/browse/HDFS-7287
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.6.0
>
> Attachments: HDFS-7287.1.patch, HDFS-7287.2.patch, HDFS-7287.patch, 
> testXMLOutput
>
>
> If the filename contains a character which is invalid in XML, 
> TextWriterImageVisitor.write() or PBImageXmlWriter.o() prints out the string 
> unescaped. For us this was the character 0x0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7287) The OfflineImageViewer (OIV) can output invalid XML depending on the filename

2014-10-29 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7287:
---
   Resolution: Fixed
Fix Version/s: 2.6.0
   Status: Resolved  (was: Patch Available)

> The OfflineImageViewer (OIV) can output invalid XML depending on the filename
> -
>
> Key: HDFS-7287
> URL: https://issues.apache.org/jira/browse/HDFS-7287
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.6.0
>
> Attachments: HDFS-7287.1.patch, HDFS-7287.2.patch, HDFS-7287.patch, 
> testXMLOutput
>
>
> If the filename contains a character which is invalid in XML, 
> TextWriterImageVisitor.write() or PBImageXmlWriter.o() prints out the string 
> unescaped. For us this was the character 0x0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188861#comment-14188861
 ] 

Colin Patrick McCabe commented on HDFS-7199:


I'm having trouble understanding this patch.  Won't the exception you are 
setting with {{setLastException(new IOException("DataStreamer Exception: 
",e))}} overwrite the exception set on these previous lines:

{code}
  if (e instanceof IOException) {
setLastException((IOException)e);
  }
 {code}

Wouldn't it make more sense to simply add an else statement here where we wrap 
the non-IOE in an IOE?

bq. working in progress patch. I will work on creating the test case. It is a 
littble bit hard.

It looks like this will end up being a 1 or 2 line patch.  So we could 
potentially commit this JIRA and file a follow-up JIRA for the test case.  I 
think it should be possible to write a good test case using Mockito or perhaps 
one of the fault injectors.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-29 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188855#comment-14188855
 ] 

Chris Nauroth commented on HDFS-6994:
-

If it's helpful, take a look at HDFS-573 for example usage of CMake on Windows.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token

2014-10-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188847#comment-14188847
 ] 

Allen Wittenauer commented on HDFS-7295:


bq. Especially since not everyone here understands the use case yet — There are 
no keytabs here. These are not services.

I think we do, actually. The argument back is (still): why shouldn't these be 
services?  

There are lots of examples of things that run for a long time (e.g., 0xdata, 
samza, storm) that people have been using for quite a while now.  Spark isn't 
magically different because it is this year's new hotness.  In every single one 
of these cases that I'm familiar with, it is almost always in the best interest 
of the user to run these as a dedicated service account and treat it as a 
service rather than as the user.  They are almost always being managed by team. 
 They almost always feed multiple inputs and multiple outputs from various 
sources and usually from other teams.  Plus there is the bus factor: if that 
user gets hit by a bus, who takes it over when that user account gets removed?

The only case that I know of where running as the user makes sense is during 
the experimentation phase.  To which, in my mind, they can live with their 
service dying after 7 days.

With ACLs now in HDFS, it makes even less sense to run these as the user.

> Support arbitrary max expiration times for delegation token
> ---
>
> Key: HDFS-7295
> URL: https://issues.apache.org/jira/browse/HDFS-7295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. 
> This is a problem for different users of HDFS such as long running YARN apps. 
> Users should be allowed to optionally specify max lifetime for their tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7287) The OfflineImageViewer (OIV) can output invalid XML depending on the filename

2014-10-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188841#comment-14188841
 ] 

Colin Patrick McCabe commented on HDFS-7287:


bq. Here's the patch without the test barfing.

Great!

bq. The build doesn't have any test failures

It has a test timeout, on {{TestPread}}.  However, this is clearly not related 
to your patch.

bq. I'm guessing this is because of the recent changes to test-patch.sh

test-patch.sh wasn't changed recently.  smart-patch-apply.sh was, but that 
shouldn't have anything to do with this test timeout, I think.

bq. Besides this test seems unrelated. Could someone please review and merge?

+1, will commit shortly.

> The OfflineImageViewer (OIV) can output invalid XML depending on the filename
> -
>
> Key: HDFS-7287
> URL: https://issues.apache.org/jira/browse/HDFS-7287
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-7287.1.patch, HDFS-7287.2.patch, HDFS-7287.patch, 
> testXMLOutput
>
>
> If the filename contains a character which is invalid in XML, 
> TextWriterImageVisitor.write() or PBImageXmlWriter.o() prints out the string 
> unescaped. For us this was the character 0x0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6385:

Attachment: HDFS-6385.1.patch

I'm attaching a patch that adds a "Block Deletion Start Time" field to the web 
UI.  I've also attached a screen shot showing the new field.

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6385:

Status: Patch Available  (was: Open)

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6385:

Target Version/s: 2.6.0

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.1.patch, HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6385) Show when block deletion will start after NameNode startup in WebUI

2014-10-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6385:

Attachment: HDFS-6385.png

> Show when block deletion will start after NameNode startup in WebUI
> ---
>
> Key: HDFS-6385
> URL: https://issues.apache.org/jira/browse/HDFS-6385
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Chris Nauroth
> Attachments: HDFS-6385.png
>
>
> HDFS-6186 provides functionality to delay block deletion for a period of time 
> after NameNode startup. Currently we only show the number of pending block 
> deletions in WebUI. We should also show when the block deletion will start in 
> WebUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-10-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188828#comment-14188828
 ] 

Colin Patrick McCabe commented on HDFS-6994:


bq. Hi, I've been porting libhdfs3 to Windows Visual Studio 2013 and would like 
to contribute my effort back to the community.

Welcome!

bq. Should this be under HDFS-7188?

How big are the changes?  We might want to break it up if it gets too big.  If 
it can fit in a few kb, then maybe one JIRA is enough.

One more thing: if you can, please try to use CMake to build on Windows.  The 
ability to have one build system for all platforms was a big reason to switch 
to CMake in the first place.

> libhdfs3 - A native C/C++ HDFS client
> -
>
> Key: HDFS-6994
> URL: https://issues.apache.org/jira/browse/HDFS-6994
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Zhanwei Wang
> Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
>
>
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ 
> HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
> both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
> authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7199:
-
Status: Patch Available  (was: Open)

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Rushabh S Shah
>Priority: Critical
> Attachments: HDFS-7199-WIP.patch
>
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7287) The OfflineImageViewer (OIV) can output invalid XML depending on the filename

2014-10-29 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188784#comment-14188784
 ] 

Ravi Prakash commented on HDFS-7287:


The build doesn't have any test failures: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8581/ . The console says 
(Note its missing the "Tests run" string
{code}Running org.apache.hadoop.hdfs.TestPread
 0, Errors: 0, Skipped: 0, Time elapsed: 4.719 sec - in 
org.apache.hadoop.hdfs.TestSmallBlock{code}
I'm guessing this is because of the recent changes to test-patch.sh
Besides this test seems unrelated. Could someone please review and merge?

> The OfflineImageViewer (OIV) can output invalid XML depending on the filename
> -
>
> Key: HDFS-7287
> URL: https://issues.apache.org/jira/browse/HDFS-7287
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-7287.1.patch, HDFS-7287.2.patch, HDFS-7287.patch, 
> testXMLOutput
>
>
> If the filename contains a character which is invalid in XML, 
> TextWriterImageVisitor.write() or PBImageXmlWriter.o() prints out the string 
> unescaped. For us this was the character 0x0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188759#comment-14188759
 ] 

Hadoop QA commented on HDFS-7279:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677949/HDFS-7279.003.patch
  against trunk revision 5c900b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8584//console

This message is automatically generated.

> Use netty to implement DatanodeWebHdfsMethods
> -
>
> Key: HDFS-7279
> URL: https://issues.apache.org/jira/browse/HDFS-7279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, webhdfs
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
> HDFS-7279.002.patch, HDFS-7279.003.patch
>
>
> Currently the DN implements all related webhdfs functionality using jetty. As 
> the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
> and connection management, DN often suffers from long latency and OOM when 
> its webhdfs component is under sustained heavy load.
> This jira proposes to implement the webhdfs component in DN using netty, 
> which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token

2014-10-29 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188754#comment-14188754
 ] 

bc Wong commented on HDFS-7295:
---

Vinod, We're probably not on the same wavelength. I agree with all that you 
said about keytabs being the solution for services. But I'm trying to find a 
solution for apps that are started by regular users. There are no keytabs here.

--

Steve, the let-user-push-new-token solution is possible, although the user 
experience is very bad as it requires periodic intervention. I.e. the user 
can't go on a 2-week vacation.

bq. I guess you are disappointed by the negative feedback here: you had a 
simple solution to the problem of HDFS token expiry without having to 
distribute keytabs.

No, I don't feel emotional about this. I believe that we're all reasonably 
trying to find the right solution for the users. Especially since not everyone 
here understands the use case yet --- There are no keytabs here. These are not 
services.

> Support arbitrary max expiration times for delegation token
> ---
>
> Key: HDFS-7295
> URL: https://issues.apache.org/jira/browse/HDFS-7295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. 
> This is a problem for different users of HDFS such as long running YARN apps. 
> Users should be allowed to optionally specify max lifetime for their tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-29 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7279:
-
Attachment: HDFS-7279.003.patch

> Use netty to implement DatanodeWebHdfsMethods
> -
>
> Key: HDFS-7279
> URL: https://issues.apache.org/jira/browse/HDFS-7279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, webhdfs
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
> HDFS-7279.002.patch, HDFS-7279.003.patch
>
>
> Currently the DN implements all related webhdfs functionality using jetty. As 
> the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
> and connection management, DN often suffers from long latency and OOM when 
> its webhdfs component is under sustained heavy load.
> This jira proposes to implement the webhdfs component in DN using netty, 
> which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188700#comment-14188700
 ] 

Hadoop QA commented on HDFS-7097:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677893/HDFS-7097.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestPread

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8580//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8580//console

This message is automatically generated.

> Allow block reports to be processed during checkpointing on standby name node
> -
>
> Key: HDFS-7097
> URL: https://issues.apache.org/jira/browse/HDFS-7097
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
> HDFS-7097.patch
>
>
> On a reasonably busy HDFS cluster, there are stream of creates, causing data 
> nodes to generate incremental block reports.  When a standby name node is 
> checkpointing, RPC handler threads trying to process a full or incremental 
> block report is blocked on the name system's {{fsLock}}, because the 
> checkpointer acquires the read lock on it.  This can create a serious problem 
> if the size of name space is big and checkpointing takes a long time.
> All available RPC handlers can be tied up very quickly. If you have 100 
> handlers, it only takes 34 file creates.  If a separate service RPC port is 
> not used, HA transition will have to wait in the call queue for minutes. Even 
> if a separate service RPC port is configured, hearbeats from datanodes will 
> be blocked. A standby NN  with a big name space can lose all data nodes after 
> checkpointing.  The rpc calls will also be retransmitted by data nodes many 
> times, filling up the call queue and potentially causing listen queue 
> overflow.
> Since block reports are not modifying any state that is being saved to 
> fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7287) The OfflineImageViewer (OIV) can output invalid XML depending on the filename

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188699#comment-14188699
 ] 

Hadoop QA commented on HDFS-7287:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677907/HDFS-7287.2.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestPread

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8581//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8581//console

This message is automatically generated.

> The OfflineImageViewer (OIV) can output invalid XML depending on the filename
> -
>
> Key: HDFS-7287
> URL: https://issues.apache.org/jira/browse/HDFS-7287
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-7287.1.patch, HDFS-7287.2.patch, HDFS-7287.patch, 
> testXMLOutput
>
>
> If the filename contains a character which is invalid in XML, 
> TextWriterImageVisitor.write() or PBImageXmlWriter.o() prints out the string 
> unescaped. For us this was the character 0x0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188673#comment-14188673
 ] 

Hadoop QA commented on HDFS-7279:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677765/HDFS-7279.002.patch
  against trunk revision b056048.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8582//console

This message is automatically generated.

> Use netty to implement DatanodeWebHdfsMethods
> -
>
> Key: HDFS-7279
> URL: https://issues.apache.org/jira/browse/HDFS-7279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, webhdfs
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
> HDFS-7279.002.patch
>
>
> Currently the DN implements all related webhdfs functionality using jetty. As 
> the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
> and connection management, DN often suffers from long latency and OOM when 
> its webhdfs component is under sustained heavy load.
> This jira proposes to implement the webhdfs component in DN using netty, 
> which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7165) Separate block metrics for files with replication count 1

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188655#comment-14188655
 ] 

Hadoop QA commented on HDFS-7165:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677935/HDFS-7165-branch-2.patch
  against trunk revision b056048.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8583//console

This message is automatically generated.

> Separate block metrics for files with replication count 1
> -
>
> Key: HDFS-7165
> URL: https://issues.apache.org/jira/browse/HDFS-7165
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Zhe Zhang
> Fix For: 3.0.0
>
> Attachments: HDFS-7165-20141003-v1.patch, 
> HDFS-7165-20141009-v1.patch, HDFS-7165-20141010-v1.patch, 
> HDFS-7165-20141015-v1.patch, HDFS-7165-20141021-v1.patch, 
> HDFS-7165-20141021-v2.patch, HDFS-7165-branch-2.patch
>
>
> We see a lot of escalations because someone has written teragen output with a 
> replication factor of 1, a DN goes down, and a bunch of missing blocks show 
> up. These are normally false positives, since teragen output is disposable, 
> and generally speaking, users should understand this is true for all repl=1 
> files.
> It'd be nice to be able to separate out these repl=1 missing blocks from 
> missing blocks with higher replication factors..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7165) Separate block metrics for files with replication count 1

2014-10-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7165:

Attachment: HDFS-7165-branch-2.patch

> Separate block metrics for files with replication count 1
> -
>
> Key: HDFS-7165
> URL: https://issues.apache.org/jira/browse/HDFS-7165
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Zhe Zhang
> Fix For: 3.0.0
>
> Attachments: HDFS-7165-20141003-v1.patch, 
> HDFS-7165-20141009-v1.patch, HDFS-7165-20141010-v1.patch, 
> HDFS-7165-20141015-v1.patch, HDFS-7165-20141021-v1.patch, 
> HDFS-7165-20141021-v2.patch, HDFS-7165-branch-2.patch
>
>
> We see a lot of escalations because someone has written teragen output with a 
> replication factor of 1, a DN goes down, and a bunch of missing blocks show 
> up. These are normally false positives, since teragen output is disposable, 
> and generally speaking, users should understand this is true for all repl=1 
> files.
> It'd be nice to be able to separate out these repl=1 missing blocks from 
> missing blocks with higher replication factors..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7165) Separate block metrics for files with replication count 1

2014-10-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188642#comment-14188642
 ] 

Zhe Zhang commented on HDFS-7165:
-

There are 2 blockers for the branch-2 merging:
* HDFS-6252 wasn't completely ported to branch-2. HDFS-7301 has been created 
and resolved to fix this.
* HDFS-4366 was never ported to branch-2. In {{UnderReplicatedBlocks}} it 
removed {{priorityToReplIdx}} which happens to be on the same line as our added 
variable {{corruptReplOneBlocks}}. I created a branch-2 patch containing this 
simple fix.

> Separate block metrics for files with replication count 1
> -
>
> Key: HDFS-7165
> URL: https://issues.apache.org/jira/browse/HDFS-7165
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Andrew Wang
>Assignee: Zhe Zhang
> Fix For: 3.0.0
>
> Attachments: HDFS-7165-20141003-v1.patch, 
> HDFS-7165-20141009-v1.patch, HDFS-7165-20141010-v1.patch, 
> HDFS-7165-20141015-v1.patch, HDFS-7165-20141021-v1.patch, 
> HDFS-7165-20141021-v2.patch
>
>
> We see a lot of escalations because someone has written teragen output with a 
> replication factor of 1, a DN goes down, and a bunch of missing blocks show 
> up. These are normally false positives, since teragen output is disposable, 
> and generally speaking, users should understand this is true for all repl=1 
> files.
> It'd be nice to be able to separate out these repl=1 missing blocks from 
> missing blocks with higher replication factors..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >