[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066196#comment-15066196
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9494:
---

[~rakeshr], thanks for catching the bug.

For the latest patch, we should re-throw the ExecutionException as an 
IOException instead of printing a warning.  handleStreamerFailure may throw an 
exception when the failure is intolerable, i.e. when the number of remaining 
streamers is less than the number of data blocks.
{code}
+  } catch (ExecutionException ee) {
+LOG.warn(
+"Caught ExecutionException while waiting all streamer flush, ", 
ee);
+  }
{code}

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-21 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066214#comment-15066214
 ] 

GAO Rui commented on HDFS-9494:
---

[~szetszwo], great catch!  Sorry, I ignored that handleStreamerFailure itself 
might throw an IOException.  I will upload a new patch to address this soon.

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-21 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066224#comment-15066224
 ] 

GAO Rui commented on HDFS-9494:
---

Additionally, I think we might encounter other thread related 
ExecutionException as well, right?  So, I suggest to keep printing a warning 
while also re-throw the ExecutionException as an IOException.

{code}
} catch (ExecutionException ee) {
LOG.warn(
"Caught ExecutionException while waiting all streamer flush, ", ee);
throw new IOException(ee);
  }
{code}

Do you think the above codes is OK?

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9034) "StorageTypeStats" Metric should not count failed storage.

2015-12-21 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066254#comment-15066254
 ] 

Surendra Singh Lilhore commented on HDFS-9034:
--


bq. stats.subtract(node) will be called, but it will not remove the FAILED 
storage.
 Sorry if I understand wrong but I feel it will not happened. If we see the code
{code}
  synchronized void updateHeartbeat(final DatanodeDescriptor node,
  StorageReport[] reports, long cacheCapacity, long cacheUsed,
  int xceiverCount, int failedVolumes,
  VolumeFailureSummary volumeFailureSummary) {
stats.subtract(node);
node.updateHeartbeat(reports, cacheCapacity, cacheUsed,
  xceiverCount, failedVolumes, volumeFailureSummary);
stats.add(node);
  }
{code}
It will subtract before updating the FAILED state of ARCHIVE storage in 
{{node}} object. {{node}} will contain the old state(NORMAL), new FAILED state 
is there in {{reports}}. After subtract {{node.updateHeartbeat()}} will 
update the new state in node object.


Same scenario I have done in test case, if you check test code from patch

{code}
+DataNodeTestUtils.injectDataDirFailure(dn1ArcVol1);
+DataNodeTestUtils.injectDataDirFailure(dn2ArcVol1);
+DataNodeTestUtils.injectDataDirFailure(dn3ArcVol1);
...
...
...
+// wait for heartbeat
+Thread.sleep(6000);
+storageTypeStatsMap = cluster.getNamesystem().getBlockManager()
+.getStorageTypeStats();
+assertFalse("StorageTypeStatsMap should not contain DISK Storage type",
+storageTypeStatsMap.containsKey(StorageType.DISK));
{code}

after injecting DISK failure on all the DN's , DISK storage will not be there 
in {{storageTypeStatsMap}}, mean FAILED storage is subtracted.




> "StorageTypeStats" Metric should not count failed storage.
> --
>
> Key: HDFS-9034
> URL: https://issues.apache.org/jira/browse/HDFS-9034
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Archana T
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9034.01.patch, HDFS-9034.02.patch, 
> HDFS-9034.03.patch, HDFS-9034.04.patch, dfsStorage_NN_UI2.png
>
>
> When we remove one storage type from all the DNs, still NN UI shows entry of 
> those storage type --
> Ex:for ARCHIVE
> Steps--
> 1. ARCHIVE Storage type was added for all DNs
> 2. Stop DNs
> 3. Removed ARCHIVE Storages from all DNs
> 4. Restarted DNs
> NN UI shows below --
> DFS Storage Types
> Storage Type Configured Capacity Capacity Used Capacity Remaining 
> ARCHIVE   57.18 GB64 KB (0%)  39.82 GB (69.64%)   64 KB   
> 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2015-12-21 Thread Surendra Singh Lilhore (JIRA)
Surendra Singh Lilhore created HDFS-9584:


 Summary: NPE in distcp when ssl configuration file does not exist 
in class path.
 Key: HDFS-9584
 URL: https://issues.apache.org/jira/browse/HDFS-9584
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.7.1
Reporter: Surendra Singh Lilhore
Assignee: Surendra Singh Lilhore


{noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}

if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
NullPointerException.

{code}
java.lang.NullPointerException
at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-21 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8562:

Attachment: HDFS-8562.003a.patch

Uploaded a version based on Colin's previous patch to address the issues as 
discussed above:
* Introduced {{FileChannelWrapper}} that's equivalent to {{FileChannel}} but in 
fallback case it will hold the parent {{FileInputStream}} as suggested by Colin;
* Used the FileChannelWrapper in every place that would use FileChannel instead;
Note the life cycle of FileChannelWrapper should be the same with FileChannel. 
And it could be regarded as a temporary workaround for now since we don't have 
a public API to open a FileChannel using a {{FileDescriptor}}. When it's 
available in future, this wrapper class can be easily swapped out.
Some related tests can run on both JDK7 and JDK8, but may fail on other JDK or 
JDK versions. In that case reflection can be used to support them but I wonder 
if that's really necessary since we have and need the fallback anyway.

> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.003a.patch, 
> HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap, length);
> 16.   } finally {
> 17.   IOUtils.closeQuietly(blockChannel);
> 18.   if (mappableBlock == null) {
> 19.   if (mmap != null) {
> 20.   NativeIO.POSIX.munmap(mmap); // unmapping also unlocks
> 21.   }
> 22.   }
> 23.   }
> 24.   return mappableBlock;
> 25.   }
> {code}
> We looked up 
> https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html  and
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java
>  and noticed FileInputStream relies on the Finalizer to release its resource. 
> When a class that has a finalizer created, an entry for that class instance 
> is put on a queue in the JVM so the JVM knows it has a finalizer that needs 
> to be executed.   
> The current issue is: even with programmers do call close() after using 
> FileInputStream, its finalize() method will still be called. In other words, 
> still get the side effect of the FinalReference being registered at 
> FileInputStream allocation time, and also reference processing to reclaim the 
> FinalReference during GC (any GC solution has to deal with this). 
> We can imagi

[jira] [Commented] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066341#comment-15066341
 ] 

Kai Zheng commented on HDFS-8562:
-

Put it on Jenkins and see if it works or not for all the related tests.

> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.003a.patch, 
> HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap, length);
> 16.   } finally {
> 17.   IOUtils.closeQuietly(blockChannel);
> 18.   if (mappableBlock == null) {
> 19.   if (mmap != null) {
> 20.   NativeIO.POSIX.munmap(mmap); // unmapping also unlocks
> 21.   }
> 22.   }
> 23.   }
> 24.   return mappableBlock;
> 25.   }
> {code}
> We looked up 
> https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html  and
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java
>  and noticed FileInputStream relies on the Finalizer to release its resource. 
> When a class that has a finalizer created, an entry for that class instance 
> is put on a queue in the JVM so the JVM knows it has a finalizer that needs 
> to be executed.   
> The current issue is: even with programmers do call close() after using 
> FileInputStream, its finalize() method will still be called. In other words, 
> still get the side effect of the FinalReference being registered at 
> FileInputStream allocation time, and also reference processing to reclaim the 
> FinalReference during GC (any GC solution has to deal with this). 
> We can imagine When running industry deployment HDFS, millions of files could 
> be opened and closed which resulted in a very large number of finalizers 
> being registered and subsequently being executed.  That could cause very long 
> GC pause times.
> We tried to use Files.newInputStream() to replace FileInputStream, but it was 
> clear we could not replace FileInputStream in 
> hdfs/server/datanode/fsdataset/impl/MappableBlock.java 
> We notified Oracle JVM team of this performance issue that impacting all Big 
> Data applications using HDFS. We recommended the proper fix in Java SE 
> FileInputStream. Because (1) it is really nothing wrong to use 
> FileInputStream in above datanode code, (2) as the object with a finalizer is 
> registered with finalizer list within the JV

[jira] [Updated] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-21 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9505:

Status: Patch Available  (was: Open)

> HDFS Architecture documentation needs to be refreshed.
> --
>
> Key: HDFS-9505
> URL: https://issues.apache.org/jira/browse/HDFS-9505
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch
>
>
> The HDFS Architecture document is out of date with respect to the current 
> design of the system.
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2015-12-21 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9584:
-
Attachment: HDFS-9584.patch

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2015-12-21 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066355#comment-15066355
 ] 

Surendra Singh Lilhore commented on HDFS-9584:
--

Attached patch, Please review ..

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066362#comment-15066362
 ] 

Hadoop QA commented on HDFS-9505:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 141 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 4m 25s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778742/HDFS-9505.002.patch |
| JIRA Issue | HDFS-9505 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux a5757710af6e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 52ad912 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13952/artifact/patchprocess/whitespace-eol.txt
 |
| modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs U: . |
| Max memory used | 32MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13952/console |


This message was automatically generated.



> HDFS Architecture documentation needs to be refreshed.
> --
>
> Key: HDFS-9505
> URL: https://issues.apache.org/jira/browse/HDFS-9505
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch
>
>
> The HDFS Architecture document is out of date with respect to the current 
> design of the system.
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-21 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066383#comment-15066383
 ] 

Akira AJISAKA commented on HDFS-9505:
-

+1, thanks Masatake.

> HDFS Architecture documentation needs to be refreshed.
> --
>
> Key: HDFS-9505
> URL: https://issues.apache.org/jira/browse/HDFS-9505
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch
>
>
> The HDFS Architecture document is out of date with respect to the current 
> design of the system.
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-21 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9505:

   Resolution: Fixed
Fix Version/s: 2.7.3
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed v2 patch to trunk, branch-2, branch-2.8, and branch-2.7. Thanks 
[~iwasakims] for the contribution!

> HDFS Architecture documentation needs to be refreshed.
> --
>
> Key: HDFS-9505
> URL: https://issues.apache.org/jira/browse/HDFS-9505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch
>
>
> The HDFS Architecture document is out of date with respect to the current 
> design of the system.
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-21 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9505:

Priority: Major  (was: Minor)
Hadoop Flags: Reviewed
  Issue Type: Bug  (was: Improvement)

> HDFS Architecture documentation needs to be refreshed.
> --
>
> Key: HDFS-9505
> URL: https://issues.apache.org/jira/browse/HDFS-9505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch
>
>
> The HDFS Architecture document is out of date with respect to the current 
> design of the system.
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8914) Document HA support in the HDFS HdfsDesign.md

2015-12-21 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066403#comment-15066403
 ] 

Akira AJISAKA commented on HDFS-8914:
-

bq. Is the fix has been committed successfully?
Yes. The fix has been committed. The FAILURE does not mean the failure of the 
commit but the failure of some unit tests.

> Document HA support in the HDFS HdfsDesign.md
> -
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066411#comment-15066411
 ] 

Hudson commented on HDFS-9505:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9007 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9007/])
HDFS-9505. HDFS Architecture documentation needs to be refreshed. (aajisaka: 
rev fa544020f6f71ee993f047c9b986c047a25ed84c)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md


> HDFS Architecture documentation needs to be refreshed.
> --
>
> Key: HDFS-9505
> URL: https://issues.apache.org/jira/browse/HDFS-9505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch
>
>
> The HDFS Architecture document is out of date with respect to the current 
> design of the system.
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9585) Erasure Coding: Wrong limit setting of target ByteBuffer

2015-12-21 Thread Kai Sasaki (JIRA)
Kai Sasaki created HDFS-9585:


 Summary: Erasure Coding: Wrong limit setting of target ByteBuffer
 Key: HDFS-9585
 URL: https://issues.apache.org/jira/browse/HDFS-9585
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Sasaki
Assignee: Kai Sasaki


In ErasureCodingWorker, target ByteBuffer should be limit by desired recovered 
size. But due to wrong indexing, an output ByteBuffer does not be limited 
correctly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9585) Erasure Coding: Wrong limit setting of target ByteBuffer

2015-12-21 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated HDFS-9585:
-
Status: Patch Available  (was: Open)

> Erasure Coding: Wrong limit setting of target ByteBuffer
> 
>
> Key: HDFS-9585
> URL: https://issues.apache.org/jira/browse/HDFS-9585
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>  Labels: EC
> Attachments: HDFS-9585.01.patch
>
>
> In ErasureCodingWorker, target ByteBuffer should be limit by desired 
> recovered size. But due to wrong indexing, an output ByteBuffer does not be 
> limited correctly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8914) Document HA support in the HDFS HdfsDesign.md

2015-12-21 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-8914:

 Due Date: (was: 20/Aug/15)
 Priority: Major  (was: Trivial)
Fix Version/s: 2.7.3

I think the fix is very important because users probably read this document at 
first and may misunderstand that NameNode is SPOF. Cherry-picked this to 
branch-2.7.

> Document HA support in the HDFS HdfsDesign.md
> -
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9585) Erasure Coding: Wrong limit setting of target ByteBuffer

2015-12-21 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated HDFS-9585:
-
Attachment: HDFS-9585.01.patch

> Erasure Coding: Wrong limit setting of target ByteBuffer
> 
>
> Key: HDFS-9585
> URL: https://issues.apache.org/jira/browse/HDFS-9585
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>  Labels: EC
> Attachments: HDFS-9585.01.patch
>
>
> In ErasureCodingWorker, target ByteBuffer should be limit by desired 
> recovered size. But due to wrong indexing, an output ByteBuffer does not be 
> limited correctly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8914) Document HA support in the HDFS HdfsDesign.md

2015-12-21 Thread Ravindra Babu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066423#comment-15066423
 ] 

Ravindra Babu commented on HDFS-8914:
-

Yes.  How to know which all unit tests have been failed and how to address 
them? Is there a way to get it deployed without addressing these failures?

> Document HA support in the HDFS HdfsDesign.md
> -
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8914) Document HA support in the HDFS HdfsDesign.md

2015-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066429#comment-15066429
 ] 

Hudson commented on HDFS-8914:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9008 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9008/])
Move HDFS-8914 from 2.8.0 to 2.7.3 in CHANGES.txt. (aajisaka: rev 
2cb5afffc4c546e3d0d0f68a921905967047b0e0)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Document HA support in the HDFS HdfsDesign.md
> -
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8914) Document HA support in the HDFS HdfsDesign.md

2015-12-21 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066431#comment-15066431
 ] 

Akira AJISAKA commented on HDFS-8914:
-

bq. How to know which all unit tests have been failed
Failed tests are listed in the following link, but the link will be expired in 
a few weeks. (This link is expired.)
bq. FAILURE: Integrated in Hadoop-Hdfs-trunk #2567 (See 
https://builds.apache.org/job/Hadoop-Hdfs-trunk/2567/)

bq. and how to address them?
Usually the test failures have been already reported in jira. You can search 
the issue and fix the test. If you can't find the issue, you can create a issue 
to fix the the failing test.

bq. Is there a way to get it deployed without addressing these failures?
The tests are executed after the patch is committed, so the patch is applied to 
trunk/branch-2/... if the tests fail or not.

> Document HA support in the HDFS HdfsDesign.md
> -
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9586) listCorruptFileBlocks should not output files that all replications are decommissioning

2015-12-21 Thread Phil Yang (JIRA)
Phil Yang created HDFS-9586:
---

 Summary: listCorruptFileBlocks should not output files that all 
replications are decommissioning
 Key: HDFS-9586
 URL: https://issues.apache.org/jira/browse/HDFS-9586
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Phil Yang
Assignee: Phil Yang


As HDFS-7933 said, we should count decommissioning and decommissioned nodes 
respectively and regard decommissioning nodes as special live nodes whose file 
is not corrupt or missing.

So in listCorruptFileBlocks which is used by fsck and HDFS namenode website, we 
should collect a corrupt file only if liveReplicas and decommissioning are both 
0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9586) listCorruptFileBlocks should not output files that all replications are decommissioning

2015-12-21 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9586:

Attachment: patch

> listCorruptFileBlocks should not output files that all replications are 
> decommissioning
> ---
>
> Key: HDFS-9586
> URL: https://issues.apache.org/jira/browse/HDFS-9586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
>
> As HDFS-7933 said, we should count decommissioning and decommissioned nodes 
> respectively and regard decommissioning nodes as special live nodes whose 
> file is not corrupt or missing.
> So in listCorruptFileBlocks which is used by fsck and HDFS namenode website, 
> we should collect a corrupt file only if liveReplicas and decommissioning are 
> both 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9586) listCorruptFileBlocks should not output files that all replications are decommissioning

2015-12-21 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9586:

Attachment: (was: patch)

> listCorruptFileBlocks should not output files that all replications are 
> decommissioning
> ---
>
> Key: HDFS-9586
> URL: https://issues.apache.org/jira/browse/HDFS-9586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
>
> As HDFS-7933 said, we should count decommissioning and decommissioned nodes 
> respectively and regard decommissioning nodes as special live nodes whose 
> file is not corrupt or missing.
> So in listCorruptFileBlocks which is used by fsck and HDFS namenode website, 
> we should collect a corrupt file only if liveReplicas and decommissioning are 
> both 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9586) listCorruptFileBlocks should not output files that all replications are decommissioning

2015-12-21 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9586:

Attachment: 9586-v1.patch

> listCorruptFileBlocks should not output files that all replications are 
> decommissioning
> ---
>
> Key: HDFS-9586
> URL: https://issues.apache.org/jira/browse/HDFS-9586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9586-v1.patch
>
>
> As HDFS-7933 said, we should count decommissioning and decommissioned nodes 
> respectively and regard decommissioning nodes as special live nodes whose 
> file is not corrupt or missing.
> So in listCorruptFileBlocks which is used by fsck and HDFS namenode website, 
> we should collect a corrupt file only if liveReplicas and decommissioning are 
> both 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9586) listCorruptFileBlocks should not output files that all replications are decommissioning

2015-12-21 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9586:

Status: Patch Available  (was: Open)

> listCorruptFileBlocks should not output files that all replications are 
> decommissioning
> ---
>
> Key: HDFS-9586
> URL: https://issues.apache.org/jira/browse/HDFS-9586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9586-v1.patch
>
>
> As HDFS-7933 said, we should count decommissioning and decommissioned nodes 
> respectively and regard decommissioning nodes as special live nodes whose 
> file is not corrupt or missing.
> So in listCorruptFileBlocks which is used by fsck and HDFS namenode website, 
> we should collect a corrupt file only if liveReplicas and decommissioning are 
> both 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9574) Reduce client failures during datanode restart

2015-12-21 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066489#comment-15066489
 ] 

Kihwal Lee commented on HDFS-9574:
--

This precommit (#13934) and #13935 for HDFS-7163 ran on H9 at the same time, 
causing both to run for over 5 hours and failing many tests due to internal 
timeout.
[~aw], do you think we can configure jenkins to avoid this? Since HDFS tests 
are run in parallel, concurrent hdfs builds are worse than before.

> Reduce client failures during datanode restart
> --
>
> Key: HDFS-9574
> URL: https://issues.apache.org/jira/browse/HDFS-9574
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-9574.patch, HDFS-9574.v2.patch
>
>
> Since DataXceiverServer is initialized before BP is fully up, client requests 
> will fail until the datanode registers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9105) separate replication config for client to create a file or set replication

2015-12-21 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-9105:
---
Attachment: HDFS-9105.2.2.patch

.2.2 fix checkstyle

> separate replication config for client to create a file or set replication
> --
>
> Key: HDFS-9105
> URL: https://issues.apache.org/jira/browse/HDFS-9105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: HDFS-9105.2.2.patch, HDFS-9105.2.patch, HDFS-9105.patch
>
>
> one scenario when we want this new config: we want to enforce min replication 
> of value 2, when client try to create a file, it's check against the 
> replication factor of the new config, which is set to 2,  so that client can 
> not create a file of replication of 1. But when client try to close file and 
> there are some datanode failures in the pipeline and only 1 datanode remain, 
> it is checked against old minReplication value, which has value 1, so we will 
> allow client to close the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9574) Reduce client failures during datanode restart

2015-12-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066517#comment-15066517
 ] 

Allen Wittenauer commented on HDFS-9574:


bq. do you think we can configure jenkins to avoid this? 

No.

bq.  Since HDFS tests are run in parallel, concurrent hdfs builds are worse 
than before.

Maybe this will finally be the push folks need to clean up the tests.

> Reduce client failures during datanode restart
> --
>
> Key: HDFS-9574
> URL: https://issues.apache.org/jira/browse/HDFS-9574
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-9574.patch, HDFS-9574.v2.patch
>
>
> Since DataXceiverServer is initialized before BP is fully up, client requests 
> will fail until the datanode registers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9492) RoundRobinVolumeChoosingPolicy

2015-12-21 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-9492:
--
Attachment: RoundRobinVolumeChoosingPolicy.HDFS-9492.patch

> RoundRobinVolumeChoosingPolicy
> --
>
> Key: HDFS-9492
> URL: https://issues.apache.org/jira/browse/HDFS-9492
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: BELUGA BEHR
>Priority: Trivial
> Attachments: RoundRobinVolumeChoosingPolicy.HDFS-9492.patch, 
> RoundRobinVolumeChoosingPolicy.patch
>
>
> This is some general clean-up for: RoundRobinVolumeChoosingPolicy
> I have also updated and expanded the unit tests a bit.
> There is one error message being generated that I changed.  I felt the 
> previous Exception message was not that helpful and therefore it was possible 
> to trim it down. If the exception message must be enhanced, the entire list 
> of "volumes" should be included.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability

2015-12-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066766#comment-15066766
 ] 

Chris Nauroth commented on HDFS-9569:
-

OK, let's go with the rev 4 approach.  However, I don't understand why this was 
added:

{code}
 List imageFiles = inspector.getLatestImages();
-
+if (imageFiles.size() == 0) {
+  throw new IOException("Failed to find any FSImage files.");
+}
+
{code}

I think this is unreachable code.  The implementations of {{getLatestImages}} 
are written to either return a non-empty list or throw an exception, so it 
appears to be impossible for the list to be empty by the time execution reaches 
here.

> Log the name of the fsimage being loaded for better supportability
> --
>
> Key: HDFS-9569
> URL: https://issues.apache.org/jira/browse/HDFS-9569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.7.3
>
> Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, 
> HDFS-9569.003.patch, HDFS-9569.004.patch
>
>
> When NN starts to load fsimage, it does
> {code}
>  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
>   FSImageFile imageFile, StartupOption startupOption) throws IOException {
>   LOG.debug("Planning to load image :\n" + imageFile);
>   ..
> long txId = loader.getLoadedImageTxId();
> LOG.info("Loaded image for txid " + txId + " from " + curFile);
> {code}
> A debug msg is issued at the beginning with the fsimage file name, then at 
> the end an info msg is issued after loading.
> If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we 
> don't see the first msg. It'd be helpful to always be able to see from NN 
> logs what fsimage file it's loading.
> Two improvements:
> 1. Change the above debug to info
> 2. If exception happens when loading fsimage, be sure to report the fsimage 
> name being loaded in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-21 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066800#comment-15066800
 ] 

Masatake Iwasaki commented on HDFS-9505:


Thanks, [~ajisakaa]!

> HDFS Architecture documentation needs to be refreshed.
> --
>
> Key: HDFS-9505
> URL: https://issues.apache.org/jira/browse/HDFS-9505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9505.001.patch, HDFS-9505.002.patch
>
>
> The HDFS Architecture document is out of date with respect to the current 
> design of the system.
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9582) TestLeaseRecoveryStriped file missing Apache License header and not well formatted

2015-12-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066803#comment-15066803
 ] 

Zhe Zhang commented on HDFS-9582:
-

Thanks for the catch Uma. I committed an oder version of the patch.

> TestLeaseRecoveryStriped file missing Apache License header and not well 
> formatted
> --
>
> Key: HDFS-9582
> URL: https://issues.apache.org/jira/browse/HDFS-9582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-9582-Trunk.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9173) Erasure Coding: Lease recovery for striped file

2015-12-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066805#comment-15066805
 ] 

Zhe Zhang commented on HDFS-9173:
-

Thanks for the good catch Akira. I committed 09 patch by mistake. I verified 
that the only difference b/w 09 and 10 versions is the license header.

> Erasure Coding: Lease recovery for striped file
> ---
>
> Key: HDFS-9173
> URL: https://issues.apache.org/jira/browse/HDFS-9173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Walter Su
>Assignee: Walter Su
> Fix For: 3.0.0
>
> Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch, 
> HDFS-9173.02.step125.patch, HDFS-9173.03.patch, HDFS-9173.04.patch, 
> HDFS-9173.05.patch, HDFS-9173.06.patch, HDFS-9173.07.patch, 
> HDFS-9173.08.patch, HDFS-9173.09.patch, HDFS-9173.09.patch, HDFS-9173.10.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066822#comment-15066822
 ] 

Colin Patrick McCabe commented on HDFS-8562:


Hi [~drankye],

Thanks for looking at this.  003a doesn't compile for me because I'm using 
Oracle Java 1.7.0_10-b18, which doesn't have the four-argument version of 
{{FileChannel#open}}.  I think we are going to need to use reflection for this 
to get a version that can pass Jenkins and be generally useful.

It might be better to name {{FileChannelWrapper}} something like 
{{PassedFileChannel}}, to emphasize that it was and can be passed from another 
process.

{{PassedFileChannel}} should keep around a {{FileDescriptor}} object, so that 
we don't have to modify lots and lots of functions to take two arguments 
instead of one.

We should probably also make a feature request on the Oracle Java bugtracker to 
add a {{getFileDescriptor}} method to {{FileChannel}}.  This is another example 
of a very useful  FD-related method that {{FileInputStream}} currently has and 
{{FileChannel}} doesn't, for no good reason that I can see.

> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.003a.patch, 
> HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap, length);
> 16.   } finally {
> 17.   IOUtils.closeQuietly(blockChannel);
> 18.   if (mappableBlock == null) {
> 19.   if (mmap != null) {
> 20.   NativeIO.POSIX.munmap(mmap); // unmapping also unlocks
> 21.   }
> 22.   }
> 23.   }
> 24.   return mappableBlock;
> 25.   }
> {code}
> We looked up 
> https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html  and
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java
>  and noticed FileInputStream relies on the Finalizer to release its resource. 
> When a class that has a finalizer created, an entry for that class instance 
> is put on a queue in the JVM so the JVM knows it has a finalizer that needs 
> to be executed.   
> The current issue is: even with programmers do call close() after using 
> FileInputStream, its finalize() method will still be called. In other words, 
> still get the side effect of the FinalReference being registered at 
> FileInputStream allocation time, and also referen

[jira] [Comment Edited] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066822#comment-15066822
 ] 

Colin Patrick McCabe edited comment on HDFS-8562 at 12/21/15 6:13 PM:
--

Hi [~drankye],

Thanks for looking at this.  003a doesn't compile for me because I'm using 
Oracle Java 1.7.0_10-b18, which doesn't have the four-argument version of 
{{FileChannelImpl#open}}.  I think we are going to need to use reflection for 
this to get a version that can pass Jenkins and be generally useful.

It might be better to name {{FileChannelWrapper}} something like 
{{PassedFileChannel}}, to emphasize that it was and can be passed from another 
process.

{{PassedFileChannel}} should keep around a {{FileDescriptor}} object, so that 
we don't have to modify lots and lots of functions to take two arguments 
instead of one.

We should probably also make a feature request on the Oracle Java bugtracker to 
add a {{getFileDescriptor}} method to {{FileChannel}}.  This is another example 
of a very useful  FD-related method that {{FileInputStream}} currently has and 
{{FileChannel}} doesn't, for no good reason that I can see.


was (Author: cmccabe):
Hi [~drankye],

Thanks for looking at this.  003a doesn't compile for me because I'm using 
Oracle Java 1.7.0_10-b18, which doesn't have the four-argument version of 
{{FileChannel#open}}.  I think we are going to need to use reflection for this 
to get a version that can pass Jenkins and be generally useful.

It might be better to name {{FileChannelWrapper}} something like 
{{PassedFileChannel}}, to emphasize that it was and can be passed from another 
process.

{{PassedFileChannel}} should keep around a {{FileDescriptor}} object, so that 
we don't have to modify lots and lots of functions to take two arguments 
instead of one.

We should probably also make a feature request on the Oracle Java bugtracker to 
add a {{getFileDescriptor}} method to {{FileChannel}}.  This is another example 
of a very useful  FD-related method that {{FileInputStream}} currently has and 
{{FileChannel}} doesn't, for no good reason that I can see.

> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.003a.patch, 
> HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap,

[jira] [Updated] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2015-12-21 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9584:
-
Status: Patch Available  (was: Open)

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9569) Log the name of the fsimage being loaded for better supportability

2015-12-21 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9569:

Attachment: HDFS-9569.005.patch

Thanks [~cnauroth]. 

I added the additional check thinking it was a missing one. Since 
{{getLatestImages}} ensures non-empty list is returned as you pointed out, I'm 
uploading rev 5 that has it removed. Nice catch of the new change I made.



> Log the name of the fsimage being loaded for better supportability
> --
>
> Key: HDFS-9569
> URL: https://issues.apache.org/jira/browse/HDFS-9569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.7.3
>
> Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, 
> HDFS-9569.003.patch, HDFS-9569.004.patch, HDFS-9569.005.patch
>
>
> When NN starts to load fsimage, it does
> {code}
>  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
>   FSImageFile imageFile, StartupOption startupOption) throws IOException {
>   LOG.debug("Planning to load image :\n" + imageFile);
>   ..
> long txId = loader.getLoadedImageTxId();
> LOG.info("Loaded image for txid " + txId + " from " + curFile);
> {code}
> A debug msg is issued at the beginning with the fsimage file name, then at 
> the end an info msg is issued after loading.
> If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we 
> don't see the first msg. It'd be helpful to always be able to see from NN 
> logs what fsimage file it's loading.
> Two improvements:
> 1. Change the above debug to info
> 2. If exception happens when loading fsimage, be sure to report the fsimage 
> name being loaded in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability

2015-12-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066838#comment-15066838
 ] 

Chris Nauroth commented on HDFS-9569:
-

+1 for patch v005, pending Jenkins.  Thank you!

> Log the name of the fsimage being loaded for better supportability
> --
>
> Key: HDFS-9569
> URL: https://issues.apache.org/jira/browse/HDFS-9569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.7.3
>
> Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, 
> HDFS-9569.003.patch, HDFS-9569.004.patch, HDFS-9569.005.patch
>
>
> When NN starts to load fsimage, it does
> {code}
>  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
>   FSImageFile imageFile, StartupOption startupOption) throws IOException {
>   LOG.debug("Planning to load image :\n" + imageFile);
>   ..
> long txId = loader.getLoadedImageTxId();
> LOG.info("Loaded image for txid " + txId + " from " + curFile);
> {code}
> A debug msg is issued at the beginning with the fsimage file name, then at 
> the end an info msg is issued after loading.
> If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we 
> don't see the first msg. It'd be helpful to always be able to see from NN 
> logs what fsimage file it's loading.
> Two improvements:
> 1. Change the above debug to info
> 2. If exception happens when loading fsimage, be sure to report the fsimage 
> name being loaded in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8461) Erasure coding: fix priority level of UnderReplicatedBlocks for striped block

2015-12-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066839#comment-15066839
 ] 

Zhe Zhang commented on HDFS-8461:
-

I agree, to match multiple EC policies we need to change that assertion.

> Erasure coding: fix priority level of UnderReplicatedBlocks for striped block
> -
>
> Key: HDFS-8461
> URL: https://issues.apache.org/jira/browse/HDFS-8461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Walter Su
> Fix For: HDFS-7285
>
> Attachments: HDFS-8461-HDFS-7285.001.patch, 
> HDFS-8461-HDFS-7285.002.patch
>
>
> Issues 1: correctly mark corrupted blocks.
> Issues 2: distinguish highest risk priority and normal risk priority.
> {code:title=UnderReplicatedBlocks.java}
>   private int getPriority(int curReplicas,
>   ...
> } else if (curReplicas == 1) {
>   //only on replica -risk of loss
>   // highest priority
>   return QUEUE_HIGHEST_PRIORITY;
>   ...
> {code}
> For stripe blocks, we should return QUEUE_HIGHEST_PRIORITY when curReplicas 
> == 6( Suppose 6+3 schema).
> That's important. Because
> {code:title=BlockManager.java}
> DatanodeDescriptor[] chooseSourceDatanodes(BlockInfo block,
>   ...
>  if(priority != UnderReplicatedBlocks.QUEUE_HIGHEST_PRIORITY 
>   && !node.isDecommissionInProgress() 
>   && node.getNumberOfBlocksToBeReplicated() >= maxReplicationStreams)
>   {
> continue; // already reached replication limit
>   }
>   ...
> {code}
> It may return not enough source DNs ( maybe 5), and failed to recover.
> A busy node should not be skiped if a block has highest risk/priority. The 
> issue is the striped block doesn't have priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9034) "StorageTypeStats" Metric should not count failed storage.

2015-12-21 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066868#comment-15066868
 ] 

Benoy Antony commented on HDFS-9034:


Thanks for the clarification, Surendra. The patch looks good . 
+1.
If there are no other comments, I'll commit this, tomorrow.



> "StorageTypeStats" Metric should not count failed storage.
> --
>
> Key: HDFS-9034
> URL: https://issues.apache.org/jira/browse/HDFS-9034
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Archana T
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9034.01.patch, HDFS-9034.02.patch, 
> HDFS-9034.03.patch, HDFS-9034.04.patch, dfsStorage_NN_UI2.png
>
>
> When we remove one storage type from all the DNs, still NN UI shows entry of 
> those storage type --
> Ex:for ARCHIVE
> Steps--
> 1. ARCHIVE Storage type was added for all DNs
> 2. Stop DNs
> 3. Removed ARCHIVE Storages from all DNs
> 4. Restarted DNs
> NN UI shows below --
> DFS Storage Types
> Storage Type Configured Capacity Capacity Used Capacity Remaining 
> ARCHIVE   57.18 GB64 KB (0%)  39.82 GB (69.64%)   64 KB   
> 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9557) Reduce object allocation in PB conversion

2015-12-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066881#comment-15066881
 ] 

Colin Patrick McCabe commented on HDFS-9557:


Thanks, [~daryn].

> Reduce object allocation in PB conversion
> -
>
> Key: HDFS-9557
> URL: https://issues.apache.org/jira/browse/HDFS-9557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0
>
> Attachments: HDFS-9557.patch, HDFS-9557.patch
>
>
> PB conversions use {{ByteString.copyFrom}} to populate the builder.  
> Unfortunately this creates unique instances for empty arrays instead of 
> returning the singleton {{ByteString.EMPTY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2015-12-21 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066913#comment-15066913
 ] 

Xiaoyu Yao commented on HDFS-9584:
--

Thanks [~surendrasingh] for working on this. Patch LGTM. 
+1 pending Jenkins.

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-12-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066949#comment-15066949
 ] 

Colin Patrick McCabe commented on HDFS-7784:


Thanks, [~kihwal].  Unfortunately, that's what we've seen as well... protobuf 
seems to generate a lot of garbage during startup, causing many full GCs which 
really consume a lot of time.  It used to be you could ignore temporary objects 
as long as you didn't create tenured objects, but it turns out that if there 
are too many temporaries, HotSpot pushes them into the PermGen.  At this point, 
it's not clear that parallelization is a win for fsimage loading unless we can 
mitigate that GC problem.

Have you guys looked into using the "javanano" version of protocol buffers?  
See here: https://github.com/google/protobuf/tree/master/javanano

It seems like this would generate a lot less garbage than the "official" PB 
library because it avoids builders in favor of mutable state, uses ints instead 
of enums, uses arrays instead of ArrayList, etc. etc.  I think we should 
probably adopt this on the server-side, even if we keep the client-side with 
the existing PB library.  This would help with RPC as well, of course.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9587) Modify tests that initialize MiniDFSNNTopology with hard-coded ports to use ServerSocketUtil#getPorts

2015-12-21 Thread Xiao Chen (JIRA)
Xiao Chen created HDFS-9587:
---

 Summary: Modify tests that initialize MiniDFSNNTopology with 
hard-coded ports to use ServerSocketUtil#getPorts
 Key: HDFS-9587
 URL: https://issues.apache.org/jira/browse/HDFS-9587
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Xiao Chen
Assignee: Xiao Chen


In tests with HA we have to {{setHttpPort}} to non-ephemeral ports when 
creating {{MiniDFSNNTopology}}. But hard-coding it may result in failures if 
the hard-coded port is in use in the env.
We should use {{ServerSocketUtil#getPorts}} to reduce the probability of such 
failures. This jira is to propose to update all 
{{MiniDFSNNTopology#setHttpPort}} usages.  (Currently there's only 
{{ServerSocketUtil#getPort}}, but HDFS-9444 is adding the ability to get 
multiple free ports)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9580) TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected number of invalidate blocks.

2015-12-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067010#comment-15067010
 ] 

Zhe Zhang commented on HDFS-9580:
-

Thanks Wei-Chiu, good analysis. +1 on the patch pending Jenkins.

> TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected 
> number of invalidate blocks.
> --
>
> Key: HDFS-9580
> URL: https://issues.apache.org/jira/browse/HDFS-9580
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9580.001.patch
>
>
> The failure appeared in the trunk jenkins job.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2646/
> {noformat}
> Error Message
> Expected invalidate blocks to be the number of DNs expected:<3> but was:<2>
> Stacktrace
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> I think there could be a race condition between creating a file and shutting 
> down data nodes, which failed the test.
> {noformat}
> 2015-12-19 07:11:02,765 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[]] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:45655, dest: 
> /127.0.0.1:54890, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> 6a13ec05-e1c1-4086-8a4d-d5a09636afcd, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 954174423
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:33252, dest: 
> /127.0.0.1:54426, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> d81751db-02a9-48fe-b697-77623048784b, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 957463510
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,782 [IPC Server handler 4 on 36404] INFO  
> blockmanagement.BlockManager 
> (BlockManager.java:checkBlocksProperlyReplicated(3871)) - BLOCK* 
> blk_1073741825_1001 is not COMPLETE (ucState = COMMITTED, replication# = 0 <  
> minimum = 1) in file /testRR
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:03,190 [IPC Server handler 8 on 36404] INFO  
> hdfs.StateChange (FSNamesystem.java:completeFile(2557)) - DIR* completeFile: 
> /testRR is closed by DFSClient_NONMAPREDUCE_147911011_935
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9580) TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected number of invalidate blocks.

2015-12-21 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9580:
--
Status: Patch Available  (was: Open)

> TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected 
> number of invalidate blocks.
> --
>
> Key: HDFS-9580
> URL: https://issues.apache.org/jira/browse/HDFS-9580
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9580.001.patch
>
>
> The failure appeared in the trunk jenkins job.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2646/
> {noformat}
> Error Message
> Expected invalidate blocks to be the number of DNs expected:<3> but was:<2>
> Stacktrace
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> I think there could be a race condition between creating a file and shutting 
> down data nodes, which failed the test.
> {noformat}
> 2015-12-19 07:11:02,765 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[]] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:45655, dest: 
> /127.0.0.1:54890, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> 6a13ec05-e1c1-4086-8a4d-d5a09636afcd, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 954174423
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:33252, dest: 
> /127.0.0.1:54426, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> d81751db-02a9-48fe-b697-77623048784b, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 957463510
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,782 [IPC Server handler 4 on 36404] INFO  
> blockmanagement.BlockManager 
> (BlockManager.java:checkBlocksProperlyReplicated(3871)) - BLOCK* 
> blk_1073741825_1001 is not COMPLETE (ucState = COMMITTED, replication# = 0 <  
> minimum = 1) in file /testRR
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:03,190 [IPC Server handler 8 on 36404] INFO  
> hdfs.StateChange (FSNamesystem.java:completeFile(2557)) - DIR* completeFile: 
> /testRR is closed by DFSClient_NONMAPREDUCE_147911011_935
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-12-21 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067073#comment-15067073
 ] 

Kihwal Lee commented on HDFS-7784:
--

bq.  protobuf seems to generate a lot of garbage during startup, causing many 
full GCs which really consume a lot of time.
One of the large NNs used to do multiple full GCs during start-up, but mainly 
due to initial full block report processing. Ever since the young gen size was 
increased, it stopped doing it.  We initially feared the minor collection time 
would increase dramatically, but that wasn't the case.  Along with the increase 
YG size, we set {{-XX:ParGCCardsPerStrideChunk=32768}}.

We will look into javanano version. Thanks for the pointer.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9588) DiskBalancer : Add submitDiskbalancer RPC

2015-12-21 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-9588:
--

 Summary: DiskBalancer : Add submitDiskbalancer RPC
 Key: HDFS-9588
 URL: https://issues.apache.org/jira/browse/HDFS-9588
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer & mover
Affects Versions: HDFS-1312
Reporter: Anu Engineer
Assignee: Anu Engineer
 Fix For: HDFS-1312


Add a data node RPC that allows client to submit a diskbalancer plan to data 
node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-12-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067114#comment-15067114
 ] 

Chris Nauroth commented on HDFS-9458:
-

[~xiaochen], thank you for the patch.

I think we still need to keep the line that sets 
{{DFS_NAMENODE_HTTP_ADDRESS_KEY}}.  This test starts both a regular NameNode 
and a BackupNode.  We'll want to avoid binding to default ports for both of 
those.

It's also problematic that the NameNode RPC port is hard-coded to 1234 
({{FS_DEFAULT_NAME_KEY}}).  I recommend that we change that to 0 too.

> TestBackupNode always binds to port 50070, which can cause bind failures.
> -
>
> Key: HDFS-9458
> URL: https://issues.apache.org/jira/browse/HDFS-9458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Xiao Chen
> Attachments: HDFS-9458.001.patch
>
>
> {{TestBackupNode}} does not override port settings to use a dynamically 
> selected port for the NameNode HTTP server.  It uses the default of 50070 
> defined in hdfs-default.xml.  This should be changed to select a dynamic port 
> to avoid bind errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9586) listCorruptFileBlocks should not output files that all replications are decommissioning

2015-12-21 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067124#comment-15067124
 ] 

Rushabh S Shah commented on HDFS-9586:
--

FSNameSystem#listCorruptFileBlocks gets the list of corrupt blocks from 
UnderReplicatedBlocks.QUEUE_WITH_CORRUPT_BLOCKS queue.
According to below code, the block will be added into QUEUE_WITH_CORRUPT_BLOCKS 
queue only if there are zero decommissionedReplicas (This name is little 
confusing since this is the sum of decommissioning and decommissioned replicas).

{noformat}
if (curReplicas == 0) {
  // If there are zero non-decommissioned replicas but there are
  // some decommissioned replicas, then assign them highest priority
  if (decommissionedReplicas > 0) {
return QUEUE_HIGHEST_PRIORITY;
  }
  if (readOnlyReplicas > 0) {
// only has read-only replicas, highest risk
// since the read-only replicas may go down all together.
return QUEUE_HIGHEST_PRIORITY;
  }
  //all we have are corrupt blocks
  return QUEUE_WITH_CORRUPT_BLOCKS;
{noformat}
So all the blocks that go into QUEUE_WITH_CORRUPT_BLOCKS already has zero 
decommissioning replicas.

Please correct me if my understanding is wrong.


> listCorruptFileBlocks should not output files that all replications are 
> decommissioning
> ---
>
> Key: HDFS-9586
> URL: https://issues.apache.org/jira/browse/HDFS-9586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9586-v1.patch
>
>
> As HDFS-7933 said, we should count decommissioning and decommissioned nodes 
> respectively and regard decommissioning nodes as special live nodes whose 
> file is not corrupt or missing.
> So in listCorruptFileBlocks which is used by fsck and HDFS namenode website, 
> we should collect a corrupt file only if liveReplicas and decommissioning are 
> both 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9589) Block files which have been hardlinked should be duplicated before the DataNode appends to the them

2015-12-21 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-9589:
--

 Summary: Block files which have been hardlinked should be 
duplicated before the DataNode appends to the them
 Key: HDFS-9589
 URL: https://issues.apache.org/jira/browse/HDFS-9589
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


Block files which have been hardlinked should be duplicated before the DataNode 
appends to the them.  The patch for HDFS-8860 accidentally removed this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9589) Block files which have been hardlinked should be duplicated before the DataNode appends to the them

2015-12-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9589:
---
Status: Patch Available  (was: Open)

> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them
> ---
>
> Key: HDFS-9589
> URL: https://issues.apache.org/jira/browse/HDFS-9589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-9589.001.patch
>
>
> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them.  The patch for HDFS-8860 accidentally removed 
> this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9589) Block files which have been hardlinked should be duplicated before the DataNode appends to the them

2015-12-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9589:
---
Attachment: HDFS-9589.001.patch

> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them
> ---
>
> Key: HDFS-9589
> URL: https://issues.apache.org/jira/browse/HDFS-9589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-9589.001.patch
>
>
> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them.  The patch for HDFS-8860 accidentally removed 
> this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9589) Block files which have been hardlinked should be duplicated before the DataNode appends to the them

2015-12-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067158#comment-15067158
 ] 

Colin Patrick McCabe commented on HDFS-9589:


Thanks to [~eddyxu] and [~twu] for pointing out this error.

> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them
> ---
>
> Key: HDFS-9589
> URL: https://issues.apache.org/jira/browse/HDFS-9589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-9589.001.patch
>
>
> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them.  The patch for HDFS-8860 accidentally removed 
> this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-12-21 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9458:

Attachment: HDFS-9458.002.patch

> TestBackupNode always binds to port 50070, which can cause bind failures.
> -
>
> Key: HDFS-9458
> URL: https://issues.apache.org/jira/browse/HDFS-9458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Xiao Chen
> Attachments: HDFS-9458.001.patch, HDFS-9458.002.patch
>
>
> {{TestBackupNode}} does not override port settings to use a dynamically 
> selected port for the NameNode HTTP server.  It uses the default of 50070 
> defined in hdfs-default.xml.  This should be changed to select a dynamic port 
> to avoid bind errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-12-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067176#comment-15067176
 ] 

Xiao Chen commented on HDFS-9458:
-

Thanks [~cnauroth] for the comments! Attached patch 2.
bq. I think we still need to keep the line that sets 
{{DFS_NAMENODE_HTTP_ADDRESS_KEY}}. This test starts both a regular NameNode and 
a BackupNode. We'll want to avoid binding to default ports for both of those.
I think you meant to say we need {{DFS_NAMENODE_BACKUP_HTTP_ADDRESS_KEY}}? This 
was unnecessarily set twice. I kept the line that sets it at line 161 (line # 
in trunk), the patch removed the first one.

bq. It's also problematic that the NameNode RPC port is hard-coded to 1234 
(FS_DEFAULT_NAME_KEY). I recommend that we change that to 0 too.
We can't set it to 0 here because in {{BackupNode#initialize}} where 
{{handshake}} is invoked, rpc address is read from configuration. Thus we need 
to set it to a port numer. I have just learnt that we have a util to get a 
usable ephemeral port, so updated patch 2 to use {{ServerSocketUtil#getport}}.

> TestBackupNode always binds to port 50070, which can cause bind failures.
> -
>
> Key: HDFS-9458
> URL: https://issues.apache.org/jira/browse/HDFS-9458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Xiao Chen
> Attachments: HDFS-9458.001.patch, HDFS-9458.002.patch
>
>
> {{TestBackupNode}} does not override port settings to use a dynamically 
> selected port for the NameNode HTTP server.  It uses the default of 50070 
> defined in hdfs-default.xml.  This should be changed to select a dynamic port 
> to avoid bind errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9590) NPE in Storage#unlock

2015-12-21 Thread Xiao Chen (JIRA)
Xiao Chen created HDFS-9590:
---

 Summary: NPE in Storage#unlock
 Key: HDFS-9590
 URL: https://issues.apache.org/jira/browse/HDFS-9590
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiao Chen
Assignee: Xiao Chen


The code looks to be possible to have race conditions in multiple-threaded runs.
{code}
public void unlock() throws IOException {
  if (this.lock == null)
return;
  this.lock.release();
  lock.channel().close();
  lock = null;
}
{code}
This is called in a handful of places, and I don't see any protection. Shall we 
add some synchronization mechanism? Not sure if I missed any design assumptions 
here.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067195#comment-15067195
 ] 

Chris Nauroth commented on HDFS-7553:
-

[~xiaochen], thank you for picking this up.

Seeing the change now, I'm hesitant to commit the {{join}} call in 
{{NameNode}}.  As you said, it could change shutdown behavior and introduce a 
slowdown or a hang.  The need to {{join}} is an artificial requirement of the 
tests.  In real production usage, this would rely on process shutdown at the OS 
level.

Instead, maybe we could approach this as more of a test-only change.  We could 
add a {{NameNode#getHttpServer}} method, annotated {{VisibleForTesting}}.  
Then, {{MiniDFSCluster#shutdown}} and {{MiniDFSCluster#shutdownNameNode}} could 
be changed to call that method after the current NameNode shutdown logic, and 
then call {{NameNodeHttpServer#join}}.  This would help with shutdown for 
tests, but it would leave the main shutdown sequence unaltered.

Let me know your thoughts.

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9590) NPE in Storage#unlock

2015-12-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067196#comment-15067196
 ] 

Xiao Chen commented on HDFS-9590:
-

This was observed in the following unit test failure of 
TestQJMWithFaults#testRecoverAfterDoubleFailures:
Error Message
{noformat}
Unable to shut down. Check log for details
{noformat}
Stacktrace
{noformat}
java.io.IOException: Unable to shut down. Check log for details
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.shutdown(MiniJournalCluster.java:161)
at 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testRecoverAfterDoubleFailures(TestQJMWithFaults.java:181)
at 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testRecoverAfterDoubleFailures(TestQJMWithFaults.java:138)
{noformat}
Standard Output is pretty long, but the one that Error Message wants us to 
check is:
{noformat}
2015-12-20 18:51:46,825 WARN  qjournal.MiniJournalCluster 
(MiniJournalCluster.java:shutdown(157)) - Unable to stop journal node 
org.apache.hadoop.hdfs.qjournal.server.JournalNode@fcb345b
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.unlock(Storage.java:747)
at 
org.apache.hadoop.hdfs.server.common.Storage.unlockAll(Storage.java:1125)
at 
org.apache.hadoop.hdfs.qjournal.server.JNStorage.close(JNStorage.java:249)
at 
org.apache.hadoop.hdfs.qjournal.server.Journal.close(Journal.java:227)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at 
org.apache.hadoop.hdfs.qjournal.server.JournalNode.stop(JournalNode.java:207)
at 
org.apache.hadoop.hdfs.qjournal.server.JournalNode.stopAndJoin(JournalNode.java:232)
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.shutdown(MiniJournalCluster.java:154)
at 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testRecoverAfterDoubleFailures(TestQJMWithFaults.java:181)
at 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testRecoverAfterDoubleFailures(TestQJMWithFaults.java:138)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
2015-12-20 18:51:46,825 INFO  ipc.Server (Server.java:stop(2485)) - Stopping 
server on 36031
{noformat}

Where Storage.java:747 in the version is {{this.lock.release();}}

> NPE in Storage#unlock
> -
>
> Key: HDFS-9590
> URL: https://issues.apache.org/jira/browse/HDFS-9590
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The code looks to be possible to have race conditions in multiple-threaded 
> runs.
> {code}
> public void unlock() throws IOException {
>   if (this.lock == null)
> return;
>   this.lock.release();
>   lock.channel().close();
>   lock = null;
> }
> {c

[jira] [Commented] (HDFS-9580) TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected number of invalidate blocks.

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067202#comment-15067202
 ] 

Hadoop QA commented on HDFS-9580:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 35s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
32s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 161m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.TestReplication |
| JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.TestSetTimes |
|   | hadoop.hdfs.server.namenode.TestBackupNode |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778739/HDFS-9580.001.patch |
| JIRA Issue | HDFS-9580 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  

[jira] [Commented] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-12-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067203#comment-15067203
 ] 

Chris Nauroth commented on HDFS-9458:
-

[~xiaochen], thank you for clarifying.  Patch v002 looks good to me.  +1, 
pending another Jenkins run.

> TestBackupNode always binds to port 50070, which can cause bind failures.
> -
>
> Key: HDFS-9458
> URL: https://issues.apache.org/jira/browse/HDFS-9458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Xiao Chen
> Attachments: HDFS-9458.001.patch, HDFS-9458.002.patch
>
>
> {{TestBackupNode}} does not override port settings to use a dynamically 
> selected port for the NameNode HTTP server.  It uses the default of 50070 
> defined in hdfs-default.xml.  This should be changed to select a dynamic port 
> to avoid bind errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067213#comment-15067213
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9494:
---

Sure, we may keep the log message.

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067217#comment-15067217
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9494:
---

Actually, checkStreamers is not thread safe so that the current would have a 
race condition.

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067217#comment-15067217
 ] 

Tsz Wo Nicholas Sze edited comment on HDFS-9494 at 12/21/15 11:05 PM:
--

Actually, checkStreamers is not thread safe so that the current patch would 
have a race condition.


was (Author: szetszwo):
Actually, checkStreamers is not thread safe so that the current would have a 
race condition.

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067230#comment-15067230
 ] 

Xiao Chen commented on HDFS-7553:
-

Hi [~cnauroth],

Thanks for reviewing! I agree that joining the httpserver is only required in 
the tests, and changing that may have too huge of impact.
I will go with the advice of adding the {{join}} as a test-only fix. A patch is 
coming momentarily.

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9590) NPE in Storage#unlock

2015-12-21 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067239#comment-15067239
 ] 

Mingliang Liu commented on HDFS-9590:
-

# Do you mean {{Storage$StorageDirectory#unlock()}}? There is no 
{{Storage#unlock}} method.
# Would you kindly elaborate what kind of "synchronization mechanism" do you 
expect to add? I don't know all the design assumption here either, but this 
code itself is to wrap a lock.

> NPE in Storage#unlock
> -
>
> Key: HDFS-9590
> URL: https://issues.apache.org/jira/browse/HDFS-9590
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The code looks to be possible to have race conditions in multiple-threaded 
> runs.
> {code}
> public void unlock() throws IOException {
>   if (this.lock == null)
> return;
>   this.lock.release();
>   lock.channel().close();
>   lock = null;
> }
> {code}
> This is called in a handful of places, and I don't see any protection. Shall 
> we add some synchronization mechanism? Not sure if I missed any design 
> assumptions here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9591) FSImage.loadEdits threw NullPointerException

2015-12-21 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9591:
-

 Summary: FSImage.loadEdits threw NullPointerException
 Key: HDFS-9591
 URL: https://issues.apache.org/jira/browse/HDFS-9591
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs, ha, namenode
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


https://builds.apache.org/job/PreCommit-HDFS-Build/13963/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestFailureToReadEdits/testCheckpointStartingMidEditsFile_0_/

{noformat}
Error Message

Expected non-empty 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/name-0-3/current/fsimage_005

Stacktrace

java.lang.AssertionError: Expected non-empty 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/name-0-3/current/fsimage_005
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageTestUtil.assertNNHasCheckpoints(FSImageTestUtil.java:470)
at 
org.apache.hadoop.hdfs.server.namenode.ha.HATestUtil.waitForCheckpoint(HATestUtil.java:235)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits.testCheckpointStartingMidEditsFile(TestFailureToReadEdits.java:240)

{noformat}

{noformat}
Exception in thread "Edit log tailer" 
org.apache.hadoop.util.ExitUtil$ExitException: java.lang.NullPointerException
at com.google.common.base.Joiner.join(Joiner.java:226)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:818)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:812)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:371)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:324)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:341)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:337)

at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:385)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:324)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:341)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:337)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9589) Block files which have been hardlinked should be duplicated before the DataNode appends to the them

2015-12-21 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067248#comment-15067248
 ] 

Lei (Eddy) Xu commented on HDFS-9589:
-

+1. The changes are most brought from what deleted from HDFS-8860, with 
appropriate names. 

Thanks a lot for helping with this patch. 

> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them
> ---
>
> Key: HDFS-9589
> URL: https://issues.apache.org/jira/browse/HDFS-9589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-9589.001.patch
>
>
> Block files which have been hardlinked should be duplicated before the 
> DataNode appends to the them.  The patch for HDFS-8860 accidentally removed 
> this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-7553:

Attachment: HDFS-7553.03.patch

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.03.patch, HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067251#comment-15067251
 ] 

Xiao Chen commented on HDFS-7553:
-

Attached patch 3 to reflex the above discussion. I verified again with my ugly 
repro patch, the test fails without the join in 
{{MiniDFSCluster#shutdownNameNode}}, and passes with it.

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.03.patch, HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9590) NPE in Storage$StorageDirectory#unlock()

2015-12-21 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9590:

Summary: NPE in Storage$StorageDirectory#unlock()  (was: NPE in 
Storage#unlock)

> NPE in Storage$StorageDirectory#unlock()
> 
>
> Key: HDFS-9590
> URL: https://issues.apache.org/jira/browse/HDFS-9590
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The code looks to be possible to have race conditions in multiple-threaded 
> runs.
> {code}
> public void unlock() throws IOException {
>   if (this.lock == null)
> return;
>   this.lock.release();
>   lock.channel().close();
>   lock = null;
> }
> {code}
> This is called in a handful of places, and I don't see any protection. Shall 
> we add some synchronization mechanism? Not sure if I missed any design 
> assumptions here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9590) NPE in Storage#unlock

2015-12-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067266#comment-15067266
 ] 

Xiao Chen commented on HDFS-9590:
-

Thanks [~liuml07] for the comments. 
bq. Do you mean {{Storage$StorageDirectory#unlock()}}? There is no 
Storage#unlock method.
Yes, sorry I was unclear. Updated the title.
bq. Would you kindly elaborate what kind of "synchronization mechanism" do you 
expect to add? I don't know all the design assumption here either, but this 
code itself is to wrap a lock.
My concern is that if 2 threads are calling unlock(), and both passed the null 
check in the beginning, then if 1 proceed to set {{lock = null}}, the other 
will throw the NPE in the stack trace above. I'm thinking of adding 
synchronized to the method, or perhaps more sophisticated protection outside. 
This NPE seems fundamental, so I'm looking around to see what background 
stories there are...


> NPE in Storage#unlock
> -
>
> Key: HDFS-9590
> URL: https://issues.apache.org/jira/browse/HDFS-9590
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The code looks to be possible to have race conditions in multiple-threaded 
> runs.
> {code}
> public void unlock() throws IOException {
>   if (this.lock == null)
> return;
>   this.lock.release();
>   lock.channel().close();
>   lock = null;
> }
> {code}
> This is called in a handful of places, and I don't see any protection. Shall 
> we add some synchronization mechanism? Not sure if I missed any design 
> assumptions here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8562) HDFS Performance is impacted by FileInputStream Finalizer

2015-12-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067282#comment-15067282
 ] 

Kai Zheng commented on HDFS-8562:
-

Thanks Colin for the comments and further thoughts. They sound great to me. I 
will update the patch. 
bq. We should probably also make a feature request on the Oracle Java 
bugtracker to add a getFileDescriptor method to FileChannel
[~ywang261], would you comment on this? AFAIK, there is no place to fire issue 
request to Java in Oracle Java/OpenJDK bugtracker system unless the reporter is 
a member. I guess we would need to leverage some one from Oracle as according 
to my past experience.


> HDFS Performance is impacted by FileInputStream Finalizer
> -
>
> Key: HDFS-8562
> URL: https://issues.apache.org/jira/browse/HDFS-8562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.5.0
> Environment: Impact any application that uses HDFS
>Reporter: Yanping Wang
> Attachments: HDFS-8562.002b.patch, HDFS-8562.003a.patch, 
> HDFS-8562.01.patch
>
>
> While running HBase using HDFS as datanodes, we noticed excessive high GC 
> pause spikes. For example with jdk8 update 40 and G1 collector, we saw 
> datanode GC pauses spiked toward 160 milliseconds while they should be around 
> 20 milliseconds. 
> We tracked down to GC logs and found those long GC pauses were devoted to 
> process high number of final references. 
> For example, this Young GC:
> 2715.501: [GC pause (G1 Evacuation Pause) (young) 0.1529017 secs]
> 2715.572: [SoftReference, 0 refs, 0.0001034 secs]
> 2715.572: [WeakReference, 0 refs, 0.123 secs]
> 2715.572: [FinalReference, 8292 refs, 0.0748194 secs]
> 2715.647: [PhantomReference, 0 refs, 160 refs, 0.0001333 secs]
> 2715.647: [JNI Weak Reference, 0.140 secs]
> [Ref Proc: 122.3 ms]
> [Eden: 910.0M(910.0M)->0.0B(911.0M) Survivors: 11.0M->10.0M Heap: 
> 951.1M(1536.0M)->40.2M(1536.0M)]
> [Times: user=0.47 sys=0.01, real=0.15 secs]
> This young GC took 152.9 milliseconds STW pause, while spent 122.3 
> milliseconds in Ref Proc, which processed 8292 FinalReference in 74.8 
> milliseconds plus some overhead.
> We used JFR and JMAP with Memory Analyzer to track down and found those 
> FinalReference were all from FileInputStream.  We checked HDFS code and saw 
> the use of the FileInputStream in datanode:
> https://apache.googlesource.com/hadoop-common/+/refs/heads/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
> {code}
> 1.public static MappableBlock load(long length,
> 2.FileInputStream blockIn, FileInputStream metaIn,
> 3.String blockFileName) throws IOException {
> 4.MappableBlock mappableBlock = null;
> 5.MappedByteBuffer mmap = null;
> 6.FileChannel blockChannel = null;
> 7.try {
> 8.blockChannel = blockIn.getChannel();
> 9.if (blockChannel == null) {
> 10.   throw new IOException("Block InputStream has no FileChannel.");
> 11.   }
> 12.   mmap = blockChannel.map(MapMode.READ_ONLY, 0, length);
> 13.   NativeIO.POSIX.getCacheManipulator().mlock(blockFileName, mmap, length);
> 14.   verifyChecksum(length, metaIn, blockChannel, blockFileName);
> 15.   mappableBlock = new MappableBlock(mmap, length);
> 16.   } finally {
> 17.   IOUtils.closeQuietly(blockChannel);
> 18.   if (mappableBlock == null) {
> 19.   if (mmap != null) {
> 20.   NativeIO.POSIX.munmap(mmap); // unmapping also unlocks
> 21.   }
> 22.   }
> 23.   }
> 24.   return mappableBlock;
> 25.   }
> {code}
> We looked up 
> https://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html  and
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/23bdcede4e39/src/share/classes/java/io/FileInputStream.java
>  and noticed FileInputStream relies on the Finalizer to release its resource. 
> When a class that has a finalizer created, an entry for that class instance 
> is put on a queue in the JVM so the JVM knows it has a finalizer that needs 
> to be executed.   
> The current issue is: even with programmers do call close() after using 
> FileInputStream, its finalize() method will still be called. In other words, 
> still get the side effect of the FinalReference being registered at 
> FileInputStream allocation time, and also reference processing to reclaim the 
> FinalReference during GC (any GC solution has to deal with this). 
> We can imagine When running industry deployment HDFS, millions of files could 
> be opened and closed which resulted in a very large number of finalizers 
> being registered and subsequently being executed.  That could cause very long 
> GC pause times.
> We tried to use Files.newInputStream() to replace FileInputStream, but it was 
> clear we could not replace

[jira] [Commented] (HDFS-9580) TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected number of invalidate blocks.

2015-12-21 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067281#comment-15067281
 ] 

Wei-Chiu Chuang commented on HDFS-9580:
---

Test failures look unrelated.

> TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected 
> number of invalidate blocks.
> --
>
> Key: HDFS-9580
> URL: https://issues.apache.org/jira/browse/HDFS-9580
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9580.001.patch
>
>
> The failure appeared in the trunk jenkins job.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2646/
> {noformat}
> Error Message
> Expected invalidate blocks to be the number of DNs expected:<3> but was:<2>
> Stacktrace
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> I think there could be a race condition between creating a file and shutting 
> down data nodes, which failed the test.
> {noformat}
> 2015-12-19 07:11:02,765 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[]] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:45655, dest: 
> /127.0.0.1:54890, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> 6a13ec05-e1c1-4086-8a4d-d5a09636afcd, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 954174423
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:33252, dest: 
> /127.0.0.1:54426, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> d81751db-02a9-48fe-b697-77623048784b, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 957463510
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,782 [IPC Server handler 4 on 36404] INFO  
> blockmanagement.BlockManager 
> (BlockManager.java:checkBlocksProperlyReplicated(3871)) - BLOCK* 
> blk_1073741825_1001 is not COMPLETE (ucState = COMMITTED, replication# = 0 <  
> minimum = 1) in file /testRR
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:03,190 [IPC Server handler 8 on 36404] INFO  
> hdfs.StateChange (FSNamesystem.java:completeFile(2557)) - DIR* completeFile: 
> /testRR is closed by DFSClient_NONMAPREDUCE_147911011_935
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9590) NPE in Storage$StorageDirectory#unlock()

2015-12-21 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067286#comment-15067286
 ] 

Mingliang Liu commented on HDFS-9590:
-

{quote}
2 threads are calling unlock()
{quote}
Oh, I thought it was an exclusive lock, and thus lock() happens before 
unlock(). Let me have a look at the code again. Thanks.

> NPE in Storage$StorageDirectory#unlock()
> 
>
> Key: HDFS-9590
> URL: https://issues.apache.org/jira/browse/HDFS-9590
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The code looks to be possible to have race conditions in multiple-threaded 
> runs.
> {code}
> public void unlock() throws IOException {
>   if (this.lock == null)
> return;
>   this.lock.release();
>   lock.channel().close();
>   lock = null;
> }
> {code}
> This is called in a handful of places, and I don't see any protection. Shall 
> we add some synchronization mechanism? Not sure if I missed any design 
> assumptions here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2015-12-21 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067293#comment-15067293
 ] 

Jitendra Nath Pandey commented on HDFS-8999:


 The clients do know the data pipeline and so does the NN. Therefore, at the 
instant pipeline closes, client and NN know the location of the blocks. The 
blocks can move anywhere afterwards, but that is no worse than a DN sending an 
IBR, but loosing the block immediately afterwards.

  Every system has certain scalability limits, but if a design choice can push 
the limit a bit higher, its not a bad idea to explore. The problems will come 
back with more writers and more DNs, but at least at a higher level of scale.

 Agreed, we need a solution to the race condition, the direction that 
[~jingzhao] mentioned in his last comment, seems promising.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9574) Reduce client failures during datanode restart

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067299#comment-15067299
 ] 

Hadoop QA commented on HDFS-9574:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs (total was 613, now 613). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 30s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 14s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
32s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 186m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.TestDatanodeRegistration |
|   | hadoop.hdfs.server.namenode.TestBackupNode |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailure060 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailur

[jira] [Updated] (HDFS-9094) Add command line option to ask NameNode reload configuration.

2015-12-21 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-9094:

Attachment: HDFS-9094-HDFS-9000.002.patch

> Add command line option to ask NameNode reload configuration.
> -
>
> Key: HDFS-9094
> URL: https://issues.apache.org/jira/browse/HDFS-9094
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9094-HDFS-9000.002.patch, HDFS-9094.001.patch
>
>
> This work is going to add DFS admin command that allows reloading NameNode 
> configuration. This is sibling work related to HDFS-6808.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9094) Add command line option to ask NameNode reload configuration.

2015-12-21 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067309#comment-15067309
 ] 

Xiaobing Zhou commented on HDFS-9094:
-

Patch V002 is based on HDFS-9414 by reusing Reconfiguration Protocol.

> Add command line option to ask NameNode reload configuration.
> -
>
> Key: HDFS-9094
> URL: https://issues.apache.org/jira/browse/HDFS-9094
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9094-HDFS-9000.002.patch, HDFS-9094.001.patch
>
>
> This work is going to add DFS admin command that allows reloading NameNode 
> configuration. This is sibling work related to HDFS-6808.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9590) NPE in Storage$StorageDirectory#unlock()

2015-12-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067310#comment-15067310
 ] 

Xiao Chen commented on HDFS-9590:
-

I take it back. So the locking itself looks to be exclusive - should only be 
acquired once. I guess that's why the unlock code doesn't have any protection. 
I'm still digging into the code to see how the given NPE was thrown...

> NPE in Storage$StorageDirectory#unlock()
> 
>
> Key: HDFS-9590
> URL: https://issues.apache.org/jira/browse/HDFS-9590
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The code looks to be possible to have race conditions in multiple-threaded 
> runs.
> {code}
> public void unlock() throws IOException {
>   if (this.lock == null)
> return;
>   this.lock.release();
>   lock.channel().close();
>   lock = null;
> }
> {code}
> This is called in a handful of places, and I don't see any protection. Shall 
> we add some synchronization mechanism? Not sure if I missed any design 
> assumptions here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9094) Add command line option to ask NameNode reload configuration.

2015-12-21 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-9094:

Attachment: HDFS-9094-HDFS-9000.003.patch

V003 fixed some minor issues.

> Add command line option to ask NameNode reload configuration.
> -
>
> Key: HDFS-9094
> URL: https://issues.apache.org/jira/browse/HDFS-9094
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9094-HDFS-9000.002.patch, 
> HDFS-9094-HDFS-9000.003.patch, HDFS-9094.001.patch
>
>
> This work is going to add DFS admin command that allows reloading NameNode 
> configuration. This is sibling work related to HDFS-6808.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9589) Block files which have been hardlinked should be duplicated before the DataNode appends to the them

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067346#comment-15067346
 ] 

Hadoop QA commented on HDFS-9589:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs (total was 145, now 146). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 20s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 32s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 141m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
| JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.namenode.TestFSImageWithXAttr |
|   | hadoop.hdfs.server.namen

[jira] [Commented] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067355#comment-15067355
 ] 

Chris Nauroth commented on HDFS-7553:
-

I think patch v03 is the right idea.  Here are a few comments.
# I'd like to make one more change in {{MiniDFSCluster}}.  In the {{shutdown}} 
method, there is another spot that duplicates the calls to {{NameNode#stop}} 
and {{NameNode#join}}.  I'd like to add the call to {{NameNodeHttpServer#join}} 
there too.  This isn't directly related to the test failure that was reported 
originally, but it can help protect us from other similar problems in the 
future.  Possibly consider a helper method to refactor some of the duplication.
# We should not swallow {{InterruptedException}}.  The Hadoop code has a bad 
habit of doing this, but it's a bad practice, because it can cause harm to 
other parts of the code that expect to have visbility of the thread's 
interrupted status.  In the {{catch (InterruptedException e)}} block, add a 
call to {{Thread.currentThread().interrupt()}} to restore the interrupted 
status.

Thank you!

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.03.patch, HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2015-12-21 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067363#comment-15067363
 ] 

GAO Rui commented on HDFS-9494:
---

Thanks Nicholas. I have considered {{checkStreamers()}} issue before too. Cause 
in {{checkStreamers()}}, we just read {{streamers}} and {{failedStreamers}}, no 
operation to change them. So, even different thread call {{checkStreamers()}}, 
there may not have a conflict I think. For race condition, could you share more 
details?

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9588) DiskBalancer : Add submitDiskbalancer RPC

2015-12-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9588:
---
Attachment: HDFS-9588-HDFS-1312.001.patch

To be submitted after HDFS-9502, posting the patch for ease of code review.


> DiskBalancer : Add submitDiskbalancer RPC
> -
>
> Key: HDFS-9588
> URL: https://issues.apache.org/jira/browse/HDFS-9588
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: HDFS-1312
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9588-HDFS-1312.001.patch
>
>
> Add a data node RPC that allows client to submit a diskbalancer plan to data 
> node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067374#comment-15067374
 ] 

Hadoop QA commented on HDFS-9458:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 17s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 47s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 36s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.TestMiniDFSCluster |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/h

[jira] [Commented] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067414#comment-15067414
 ] 

Hadoop QA commented on HDFS-7553:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
0s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 43s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 12s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778932/HDFS-7553.03.patch |
| JIRA Issue | HDFS-7553 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | 

[jira] [Commented] (HDFS-9569) Log the name of the fsimage being loaded for better supportability

2015-12-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067431#comment-15067431
 ] 

Yongjun Zhang commented on HDFS-9569:
-

Thanks [~cnauroth]!


> Log the name of the fsimage being loaded for better supportability
> --
>
> Key: HDFS-9569
> URL: https://issues.apache.org/jira/browse/HDFS-9569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.7.3
>
> Attachments: HDFS-9569.001.patch, HDFS-9569.002.patch, 
> HDFS-9569.003.patch, HDFS-9569.004.patch, HDFS-9569.005.patch
>
>
> When NN starts to load fsimage, it does
> {code}
>  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
>   FSImageFile imageFile, StartupOption startupOption) throws IOException {
>   LOG.debug("Planning to load image :\n" + imageFile);
>   ..
> long txId = loader.getLoadedImageTxId();
> LOG.info("Loaded image for txid " + txId + " from " + curFile);
> {code}
> A debug msg is issued at the beginning with the fsimage file name, then at 
> the end an info msg is issued after loading.
> If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we 
> don't see the first msg. It'd be helpful to always be able to see from NN 
> logs what fsimage file it's loading.
> Two improvements:
> 1. Change the above debug to info
> 2. If exception happens when loading fsimage, be sure to report the fsimage 
> name being loaded in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9588) DiskBalancer : Add submitDiskbalancer RPC

2015-12-21 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067446#comment-15067446
 ] 

Mingliang Liu commented on HDFS-9588:
-

Thanks for working on this.

{code}
message SubmitDiskBalancerPlanRequestProto {
...
  required uint64 maxDiskBandwidth = 3 [default = 10]; // deafult 10 MB/s
...
}
{code}

If the {{maxDiskBandwidth}} has default value, maybe we should make it optional.

> DiskBalancer : Add submitDiskbalancer RPC
> -
>
> Key: HDFS-9588
> URL: https://issues.apache.org/jira/browse/HDFS-9588
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: HDFS-1312
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9588-HDFS-1312.001.patch
>
>
> Add a data node RPC that allows client to submit a diskbalancer plan to data 
> node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-7553:

Attachment: HDFS-7553.04.patch

Thanks Chris. Both comments make sense to me, and patch 4 reflexes them.

I also reviewed other usages of {{NameNode#stop}} and {{NameNode#join}}, and 
updated the code to provide a {{NameNode#joinHttpServer}} instead of a getter. 
This way, other tests can easily join the http server as needed. Currently I 
only see 1 place that needs this (TestStartup). The patch includes this as well.

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.03.patch, HDFS-7553.04.patch, HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-7553:

Attachment: HDFS-7553.04.patch

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.03.patch, HDFS-7553.04.patch, HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7553) fix the TestDFSUpgradeWithHA due to BindException

2015-12-21 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-7553:

Attachment: (was: HDFS-7553.04.patch)

> fix the TestDFSUpgradeWithHA due to BindException
> -
>
> Key: HDFS-7553
> URL: https://issues.apache.org/jira/browse/HDFS-7553
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7553-001.txt, HDFS-7553.002.patch, 
> HDFS-7553.03.patch, HDFS-7553.04.patch, HDFS-7553.repro.patch
>
>
> see 
> https://builds.apache.org/job/PreCommit-HDFS-Build/9092//testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestDFSUpgradeWithHA/testNfsUpgrade/
>  :
> Error Message
> Port in use: localhost:57896
> Stacktrace
> java.net.BindException: Port in use: localhost:57896
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:868)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:809)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:704)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:591)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1815)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1796)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA.testNfsUpgrade(TestDFSUpgradeWithHA.java:285)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-12-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067452#comment-15067452
 ] 

Xiao Chen commented on HDFS-9458:
-

Thank you [~cnauroth]!
The failed tests look unrelated to this patch.

> TestBackupNode always binds to port 50070, which can cause bind failures.
> -
>
> Key: HDFS-9458
> URL: https://issues.apache.org/jira/browse/HDFS-9458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Xiao Chen
> Attachments: HDFS-9458.001.patch, HDFS-9458.002.patch
>
>
> {{TestBackupNode}} does not override port settings to use a dynamically 
> selected port for the NameNode HTTP server.  It uses the default of 50070 
> defined in hdfs-default.xml.  This should be changed to select a dynamic port 
> to avoid bind errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9094) Add command line option to ask NameNode reload configuration.

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067475#comment-15067475
 ] 

Hadoop QA commented on HDFS-9094:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 27 new checkstyle issues in 
hadoop-hdfs-project (total was 373, now 395). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 36s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 39s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 52s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||

[jira] [Commented] (HDFS-9094) Add command line option to ask NameNode reload configuration.

2015-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067485#comment-15067485
 ] 

Hadoop QA commented on HDFS-9094:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} Patch generated 27 new checkstyle issues in 
hadoop-hdfs-project (total was 374, now 396). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 49s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 7s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 54s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 179m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||

[jira] [Commented] (HDFS-9586) listCorruptFileBlocks should not output files that all replications are decommissioning

2015-12-21 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067495#comment-15067495
 ] 

Phil Yang commented on HDFS-9586:
-

{quote}
So all the blocks that go into QUEUE_WITH_CORRUPT_BLOCKS already has zero 
decommissioning replicas.
{quote}
In theory, I think it is right. However when decommissioning nodes there may be 
false positives of block-missing error. It is another bug that I'm digging out.

I'm not sure why we need 
{code}
if (inode != null && blockManager.countNodes(blk).liveReplicas() == 0) 
{code}
in FSNamesystem.listCorruptFileBlocks, in theory the second condition should be 
always true. But because of the bug, we need this condition indeed, so I think 
we need another condition about decommissioning replicas.

> listCorruptFileBlocks should not output files that all replications are 
> decommissioning
> ---
>
> Key: HDFS-9586
> URL: https://issues.apache.org/jira/browse/HDFS-9586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9586-v1.patch
>
>
> As HDFS-7933 said, we should count decommissioning and decommissioned nodes 
> respectively and regard decommissioning nodes as special live nodes whose 
> file is not corrupt or missing.
> So in listCorruptFileBlocks which is used by fsck and HDFS namenode website, 
> we should collect a corrupt file only if liveReplicas and decommissioning are 
> both 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2015-12-21 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9584:
-
Status: Open  (was: Patch Available)

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >