[jira] [Commented] (HDFS-8820) Simplify enabling NameNode RPC congestion control and FairCallQueue

2015-11-30 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032571#comment-15032571
 ] 

Ming Ma commented on HDFS-8820:
---

Thanks [~arpitagarwal].

* The last change made to {{RpcEngine}} is from HDFS-7073 in 2.6. It doesn't 
seem to cause any issues so far. So maybe Slider is the only implementation 
outside HDFS/YARN/MR?  

* Alternatively, what if we define {{RpcEngineV2}} and have 
{{ProtobufRpcEngine}} implement it? {{RpcEngine}} won't be changed so that it 
won't break other implementations. Then deprecate the old interface and remove 
it in trunk.

I agree we need to treat compatibility as an important feature for hadoop. But 
for this specific case, wonder its impact and if we can somehow use the more 
elegant builder approach.

> Simplify enabling NameNode RPC congestion control and FairCallQueue
> ---
>
> Key: HDFS-8820
> URL: https://issues.apache.org/jira/browse/HDFS-8820
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-8820.01.patch, HDFS-8820.02.patch, 
> HDFS-8820.03.patch
>
>
> Enabling RPC Congestion control and FairCallQueue settings can be simplified 
> with HDFS-specific configuration keys. Currently the configuration requires 
> knowing the exact RPC port number and also whether the service RPC port is 
> enabled or not separately. If a separate service RPC endpoint is not defined 
> then RPC congestion control must be enabled ([see 
> comment|https://issues.apache.org/jira/browse/HDFS-8820?focusedCommentId=14987848=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14987848]
>  from [~mingma] below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032477#comment-15032477
 ] 

Xiao Chen commented on HDFS-9470:
-

The test failures looks unrelated, and passed locally.

> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7764) DirectoryScanner shouldn't abort the scan if one directory had an error

2015-11-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032490#comment-15032490
 ] 

Colin Patrick McCabe commented on HDFS-7764:


Thanks, [~rakeshr].

{code}
856   if (fileNames.size() < 0) {
857 return report;
858   }
{code}
What's the purpose of this if statement?  The size of a list can't be less than 
0.

{code}
859   files = new File[fileNames.size()];
860   for (int i = 0; i < fileNames.size(); i++) {
861 files[i] = new File(dir, fileNames.get(i));
862   }
863   Arrays.sort(files);
{code}
It would be nice to avoid allocating all these new arrays.  We don't really 
need them.  We should be able to sort the list with {{List#sort}}, and we can 
turn the {{String}} objects into {{File}} objects one at a time in the for loop.

> DirectoryScanner shouldn't abort the scan if one directory had an error
> ---
>
> Key: HDFS-7764
> URL: https://issues.apache.org/jira/browse/HDFS-7764
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-7764-01.patch, HDFS-7764.patch
>
>
> If there is an exception while preparing the ScanInfo for the blocks in the 
> directory, DirectoryScanner is immediately throwing exception and coming out 
> of the current scan cycle. The idea of this jira is to discuss & improve the 
> exception handling mechanism.
> DirectoryScanner.java
> {code}
> for (Entry report :
> compilersInProgress.entrySet()) {
>   try {
> dirReports[report.getKey()] = report.getValue().get();
>   } catch (Exception ex) {
> LOG.error("Error compiling report", ex);
> // Propagate ex to DataBlockScanner to deal with
> throw new RuntimeException(ex);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8831) Trash Support for deletion in HDFS encryption zone

2015-11-30 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032584#comment-15032584
 ] 

Arpit Agarwal commented on HDFS-8831:
-

Hi [~xyao], thanks for the detailed design note. My comments, mostly around 
potential compatibility of classes tagged {{@InterfaceAudience.Public}}.

# DistributedFileSystem.java:2326: We can skip the call to dfs.getEZForPath if 
isHDFSEncryptionEnabled is false to avoid extra RPC call when TDE is not 
enabled.
# FileSystem.java:2701: Can we define .Trash as a constant somewhere?
# Trash.java:98: Avoid extra RPC for log statement. Can we cache the 
currentTrashDir some time earlier?
# TrashPolicy.java:48: I don't think we should mark it as deprecated. While the 
TrashPolicyDefault no longer uses the home parameter other implementations may 
be passing a different value here in theory.
# TrashPolicy.java:57: Also we should have a default implementation of this 
routine else it will be a backward incompatible change (will break existing 
implementations of this public interface).
# TrashPolicy.java:83: Need default implementation. It can just throw 
UnsupportedOperationException which should be handled by the caller.
# TrashPolicy.java:92: Need default implementation. It can just throw 
UnsupportedOperationException which should be handled by the caller.
# TrashPolicy.java:108: We should leave the old method in place to keep the 
public interface backwards compatible. Perhaps to be conservative we should 
respect the 'home' parameter if one is passed in instead of using 
Filesystem#getTrashRoot?

https://github.com/arp7/hadoop/commit/7b3212d2c41cc35cce81eadc68c029e0fc67a429

> Trash Support for deletion in HDFS encryption zone
> --
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, 
> HDFS-8831.01.patch, HDFS-8831.02.patch
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of 
> the file being deleted appropriately to the same encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032506#comment-15032506
 ] 

Colin Patrick McCabe commented on HDFS-8791:


Thanks, guys. +1 for this in trunk and branch-2.

Putting this in branch-2.6 would be a little unusual since it requires a layout 
version upgrade, which I thought we had agreed not to do in bugfix releases.  
But I will leave that decision up to the release manager for the 2.6 branch.

Also, I would really like to see a unit test.  If necessary we can get this in 
and then open a JIRA for that, but it should be on our radar.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports

2015-11-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032516#comment-15032516
 ] 

Colin Patrick McCabe commented on HDFS-9484:


Good find, [~liuml07].

> NNThroughputBenchmark$BlockReportStats should not send empty block reports
> --
>
> Key: HDFS-9484
> URL: https://issues.apache.org/jira/browse/HDFS-9484
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
> {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
> the block report list by encoding generated {{blocks}} in test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors

2015-11-30 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032543#comment-15032543
 ] 

Anu Engineer commented on HDFS-9449:


test failures are not related to this patch.

> DiskBalancer : Add connectors
> -
>
> Key: HDFS-9449
> URL: https://issues.apache.org/jira/browse/HDFS-9449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9449-HDFS-1312.001.patch, 
> HDFS-9449-HDFS-1312.002.patch
>
>
> Connectors allow disk balancer data models to connect to an existing cluster 
> - Namenode or to a json file which describes the cluster. This is used for 
> discovering the physical layout of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9478) Reason for failing ipc.FairCallQueue contruction should be thrown

2015-11-30 Thread Archana T (JIRA)
Archana T created HDFS-9478:
---

 Summary: Reason for failing ipc.FairCallQueue contruction should 
be thrown
 Key: HDFS-9478
 URL: https://issues.apache.org/jira/browse/HDFS-9478
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Archana T
Assignee: Ajith S
Priority: Minor


When FairCallQueue Construction fails, NN fails to start throwing 
RunTimeException without throwing any reason on why it fails.

2015-11-30 17:45:26,661 INFO org.apache.hadoop.ipc.FairCallQueue: FairCallQueue 
is in use with 4 queues.
2015-11-30 17:45:26,665 DEBUG org.apache.hadoop.metrics2.util.MBeans: 
Registered Hadoop:service=ipc.65110,name=DecayRpcScheduler
2015-11-30 17:45:26,666 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
Failed to start namenode.
java.lang.RuntimeException: org.apache.hadoop.ipc.FairCallQueue could not be 
constructed.
at 
org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:96)
at org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:55)
at org.apache.hadoop.ipc.Server.(Server.java:2241)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:942)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:784)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:346)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:750)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:687)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:889)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:872)


Example: reason for above failure could have been --
1. the weights were not equal to the number of queues configured.
2. decay-scheduler.thresholds not in sync with number of queues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9482) Expose reservedForReplicas as a metric

2015-11-30 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HDFS-9482:
--

 Summary:  Expose reservedForReplicas as a metric
 Key: HDFS-9482
 URL: https://issues.apache.org/jira/browse/HDFS-9482
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.

2015-11-30 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9483:
---

 Summary: Documentation does not cover use of "swebhdfs" as URL 
scheme for SSL-secured WebHDFS.
 Key: HDFS-9483
 URL: https://issues.apache.org/jira/browse/HDFS-9483
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth


If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in a 
URL to access it.  The current documentation does not state this anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9471) Webhdfs not working with shell command when kerberos security+https is enabled.

2015-11-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9471.
-
Resolution: Not A Problem

[~surendrasingh], that's a good point about the documentation.  I filed 
HDFS-9483 to track a documentation improvement.  If you're interested in 
providing the documentation, please feel free to pick up that one.  I'm going 
to resolve this one.

> Webhdfs not working with shell command when kerberos security+https is 
> enabled.
> ---
>
> Key: HDFS-9471
> URL: https://issues.apache.org/jira/browse/HDFS-9471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Attachments: HDFS-9471.01.patch
>
>
> *Client exception*
> {code}
> secure@host85:/opt/hdfsdata/HA/install/hadoop/namenode/bin> ./hdfs dfs -ls 
> webhdfs://x.x.x.x:50070/test
> 15/11/25 18:46:55 ERROR web.WebHdfsFileSystem: Unable to get HomeDirectory 
> from original File System
> java.net.SocketException: Unexpected end of file from server
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:792)
> {code}
> *Exception in namenode log*
> {code}
> 2015-11-26 11:03:18,231 WARN org.mortbay.log: EXCEPTION
> javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
> at 
> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:710)
> at sun.security.ssl.InputRecord.read(InputRecord.java:527)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
> at 
> org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:708)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> This is because URL schema hard coded in 
> {{WebHdfsFileSystem.getTransportScheme()}}.
> {code}
>  /**
>* return the underlying transport protocol (http / https).
>*/
>   protected String getTransportScheme() {
> return "http";
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available

2015-11-30 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-9474:
-
Attachment: HDFS-9474.001.patch

> TestPipelinesFailover would fail if ifconfig is not available
> -
>
> Key: HDFS-9474
> URL: https://issues.apache.org/jira/browse/HDFS-9474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
> Attachments: HDFS-9474.001.patch
>
>
> HDFS-6693 introduced some debug message to debug why when 
> TestPipelinesFailover fails. 
> HDFS-9438 restricted the debug message to Linux/Mac/Solaris.  However, the 
> test would fail when printing debug message if "ifconfig" command is not 
> available in certain environment.
> This is not quite right. The test should not fail due to the debug message 
> printing. We should catch any exception thrown from the code that prints 
> debug message, and issue a warning message. 
> Suggest to make this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available

2015-11-30 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9474 started by John Zhuge.

> TestPipelinesFailover would fail if ifconfig is not available
> -
>
> Key: HDFS-9474
> URL: https://issues.apache.org/jira/browse/HDFS-9474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
> Attachments: HDFS-9474.001.patch
>
>
> HDFS-6693 introduced some debug message to debug why when 
> TestPipelinesFailover fails. 
> HDFS-9438 restricted the debug message to Linux/Mac/Solaris.  However, the 
> test would fail when printing debug message if "ifconfig" command is not 
> available in certain environment.
> This is not quite right. The test should not fail due to the debug message 
> printing. We should catch any exception thrown from the code that prints 
> debug message, and issue a warning message. 
> Suggest to make this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.

2015-11-30 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore reassigned HDFS-9483:


Assignee: Surendra Singh Lilhore

> Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured 
> WebHDFS.
> -
>
> Key: HDFS-9483
> URL: https://issues.apache.org/jira/browse/HDFS-9483
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Surendra Singh Lilhore
>
> If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in 
> a URL to access it.  The current documentation does not state this anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032081#comment-15032081
 ] 

Kihwal Lee commented on HDFS-8791:
--

bq.  I will test it again if that is the case. 
Retesting shows {{previous}} containing the valid content. I guess I somehow 
messed up the testing first time. 
+1 from me.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9480) Expose nonDfsUsed via StorageTypeStats and DatanodeStatistics

2015-11-30 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HDFS-9480:
--

 Summary:  Expose nonDfsUsed via StorageTypeStats and 
DatanodeStatistics
 Key: HDFS-9480
 URL: https://issues.apache.org/jira/browse/HDFS-9480
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9481) Expose reservedForReplicas as a metric

2015-11-30 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HDFS-9481:
--

 Summary:  Expose reservedForReplicas as a metric
 Key: HDFS-9481
 URL: https://issues.apache.org/jira/browse/HDFS-9481
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9482) Replace DatanodeInfo constructors with a builder pattern

2015-11-30 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-9482:
---
Summary: Replace DatanodeInfo constructors with a builder pattern  (was:  
Expose reservedForReplicas as a metric)

> Replace DatanodeInfo constructors with a builder pattern
> 
>
> Key: HDFS-9482
> URL: https://issues.apache.org/jira/browse/HDFS-9482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9038) Reserved space is erroneously counted towards non-DFS used.

2015-11-30 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032110#comment-15032110
 ] 

Brahma Reddy Battula commented on HDFS-9038:


Raised separate jira's for above three improvements (HDFS-9480,HDFS-9481 and 
HDFS-9482).And uploaded the patch to address [~vinayrpet] and [~arpitagarwal] 
comments,kindly review..

> Reserved space is erroneously counted towards non-DFS used.
> ---
>
> Key: HDFS-9038
> URL: https://issues.apache.org/jira/browse/HDFS-9038
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9425) Expose number of blocks per volume as a metric

2015-11-30 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032116#comment-15032116
 ] 

Brahma Reddy Battula commented on HDFS-9425:


can somebody review this patch..?,thanks..

> Expose number of blocks per volume as a metric
> --
>
> Key: HDFS-9425
> URL: https://issues.apache.org/jira/browse/HDFS-9425
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9425.patch
>
>
> It will be helpful for user to know the usage in number of blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable

2015-11-30 Thread Bob (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob reassigned HDFS-9479:
-

Assignee: Bob

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network 
> unstable
> 
>
> Key: HDFS-9479
> URL: https://issues.apache.org/jira/browse/HDFS-9479
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Bob
>Assignee: Bob
>Priority: Blocker
>
> {code}
> Java stack information for the threads listed above:
> ===
> "Thread-1":
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228)
>   - waiting to lock <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242)
>   - locked <0xd593aed0> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd593ae88> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd55d9678> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> "LeaseRenewer:hdfs@hacluster:8020":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720)
>   - waiting to lock <0xd5c1a860> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465)
>   - locked <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.

2015-11-30 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9417:
-
Attachment: HDFS-9417.HDFS-8707.000.patch

Uploaded first pass at clearing up the warnings.  Couldn't run the RAT tool 
locally, so we'll rely on Jenkins to give us a wash on it.

> Clean up the RAT warnings in the HDFS-8707 branch.
> --
>
> Key: HDFS-9417
> URL: https://issues.apache.org/jira/browse/HDFS-9417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9417.HDFS-8707.000.patch
>
>
> Recent jenkins builds reveals that the pom.xml in the HDFS-8707 branch does 
> not currently exclude third-party files. The RAT plugin generates warnings as 
> these files do not have Apache headers.
> The warnings need to be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable

2015-11-30 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031952#comment-15031952
 ] 

Brahma Reddy Battula commented on HDFS-9479:


Thanks for reporting . Dupe of HDFS-9324..?

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network 
> unstable
> 
>
> Key: HDFS-9479
> URL: https://issues.apache.org/jira/browse/HDFS-9479
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Bob
>Assignee: Bob
>Priority: Blocker
>
> {code}
> Java stack information for the threads listed above:
> ===
> "Thread-1":
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228)
>   - waiting to lock <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242)
>   - locked <0xd593aed0> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd593ae88> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd55d9678> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> "LeaseRenewer:hdfs@hacluster:8020":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720)
>   - waiting to lock <0xd5c1a860> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465)
>   - locked <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable

2015-11-30 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-9479.
--
  Resolution: Duplicate
Target Version/s:   (was: 2.7.3)

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network 
> unstable
> 
>
> Key: HDFS-9479
> URL: https://issues.apache.org/jira/browse/HDFS-9479
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Bob
>Assignee: Bob
>Priority: Blocker
>
> {code}
> Java stack information for the threads listed above:
> ===
> "Thread-1":
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228)
>   - waiting to lock <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242)
>   - locked <0xd593aed0> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd593ae88> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd55d9678> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> "LeaseRenewer:hdfs@hacluster:8020":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720)
>   - waiting to lock <0xd5c1a860> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465)
>   - locked <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9468) DfsAdmin command set dataXceiver count for datanode

2015-11-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031995#comment-15031995
 ] 

Kihwal Lee commented on HDFS-9468:
--

Why don't you make it refreshable? Some configs in datanode are already 
refreshable.

> DfsAdmin command set dataXceiver count for datanode
> ---
>
> Key: HDFS-9468
> URL: https://issues.apache.org/jira/browse/HDFS-9468
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9468.001.patch
>
>
> Now in every datanode,concurrent xceivers count value are all set by 
> {{DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_DEFAULT}}.And if you want 
> to set this value for different values because some node has lower memory or 
> cores.Then you must to change the config and restart datanode.So may be we 
> can dynamic set dataxceiver count by a dfsadmin command and we can set value 
> for one or many specific nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9452) libhdfs++ Fix memory stomp in OpenFileForRead.

2015-11-30 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032026#comment-15032026
 ] 

James Clampffer commented on HDFS-9452:
---

Committed to HDFS-8707.

Thanks for the pointer to std::tie Bob, I'll check that out and use it where 
applicable in the future.

> libhdfs++ Fix memory stomp in OpenFileForRead.
> --
>
> Key: HDFS-9452
> URL: https://issues.apache.org/jira/browse/HDFS-9452
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-9452.HDFS-8707.000.patch, 
> HDFS-9452.HDFS-8707.001.patch
>
>
> Running a simple test that opens and closes a file in many threads will fail 
> under valgrind with an invalid write of size 8.
> It looks like the stack is unwinding in the calling thread before the 
> callback invoked by asio in OpenFileForRead can set the input_stream pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable

2015-11-30 Thread Bob (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob updated HDFS-9479:
--
Summary: DeadLock Happened Between DFSOutputStream and LeaseRenewer when 
Network unstable  (was: DeadLock Between DFSOutputStream and LeaseRenewer when 
Network unstable)

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network 
> unstable
> 
>
> Key: HDFS-9479
> URL: https://issues.apache.org/jira/browse/HDFS-9479
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Bob
>Priority: Blocker
>
> {code}
> Java stack information for the threads listed above:
> ===
> "Thread-1":
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228)
>   - waiting to lock <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242)
>   - locked <0xd593aed0> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd593ae88> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd55d9678> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> "LeaseRenewer:hdfs@hacluster:8020":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720)
>   - waiting to lock <0xd5c1a860> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465)
>   - locked <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable

2015-11-30 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031961#comment-15031961
 ] 

Brahma Reddy Battula commented on HDFS-9479:


I mean HDFS-9294.

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network 
> unstable
> 
>
> Key: HDFS-9479
> URL: https://issues.apache.org/jira/browse/HDFS-9479
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Bob
>Assignee: Bob
>Priority: Blocker
>
> {code}
> Java stack information for the threads listed above:
> ===
> "Thread-1":
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228)
>   - waiting to lock <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242)
>   - locked <0xd593aed0> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd593ae88> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd55d9678> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> "LeaseRenewer:hdfs@hacluster:8020":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720)
>   - waiting to lock <0xd5c1a860> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465)
>   - locked <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031954#comment-15031954
 ] 

Kihwal Lee commented on HDFS-8791:
--

This is what I saw on the upgraded node before it got finalized.  Before 
upgrade, {{current/finalized}} contained many sub directories.
{noformat}
-bash-4.1$ ls -l /xxx/data/current/BP-x/previous/finalized
total 4
drwxr-xr-x 115 hdfs users 4096 Nov 24 23:01 subdir0
{noformat}

This is what I saw in the log.
{noformat}
2015-11-24 23:06:09,980 INFO common.Storage: Upgrading block pool storage 
directory /xxx/data/current/BP-x.
   old LV = -56; old CTime = 0.
   new LV = -57; new CTime = 0
2015-11-24 23:06:11,625 INFO common.Storage: HardLinkStats: 116 Directories, 
including 3 Empty Directories, 57282 single
 Link operations, 0 multi-Link operations, linking 0 files, total 57282 
linkable files.  Also physically copied 0 other files.
2015-11-24 23:06:11,671 INFO common.Storage: Upgrade of block pool BP-x at 
/xxx/data/current/BP-x is complete
{noformat}

I just noticed the time stamp of {{subdir0}} is old, so empty directories were 
removed? I will test it again if that is the case. But I thought {{current}} 
eventually becomes {{previous}} after creating hard links, so even the empty 
dirs left intact.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9233) Create LICENSE.txt and NOTICES files for libhdfs++

2015-11-30 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031957#comment-15031957
 ] 

Bob Hansen commented on HDFS-9233:
--

[~owen.omalley] - I was looking for examples to follow, and don't see a 
LICENSE.txt or NOTICES file in any of the other Hadoop sub-projects.  There is 
a LICENSE.txt and NOTICE.txt at the top, but it doesn't appear to be 
represented anywhere else.

Since this is part of the ASF tree now, do you think that is sufficient?

> Create LICENSE.txt and NOTICES files for libhdfs++
> --
>
> Key: HDFS-9233
> URL: https://issues.apache.org/jira/browse/HDFS-9233
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
>
> We use third-party libraries that are Apache and Google licensed, and may be 
> adding an MIT-licenced third-party library.  We need to include the 
> appropriate license files for inclusion into Apache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9477) namenode starts failed:FSEditLogLoader: Encountered exception on operation TimesOp

2015-11-30 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031973#comment-15031973
 ] 

Daryn Sharp commented on HDFS-9477:
---

Notice that it's occurring because of an atime update - probably from opening 
the file.  This appears to be caused by the file descriptor-ish feature added 
awhile back.  Anyhow, I believe /.reserved paths should probably never occur in 
the edits.

> namenode starts failed:FSEditLogLoader: Encountered exception on operation 
> TimesOp
> --
>
> Key: HDFS-9477
> URL: https://issues.apache.org/jira/browse/HDFS-9477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
> Environment: Ubuntu 12.04.1 LTS, java version "1.7.0_79"
>Reporter: aplee
>Assignee: aplee
>
> backup name start failed, log below:
> 2015-11-28 14:09:13,462 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, 
> atime=1448692924700, opCode=OP_TIMES, txid=14774180]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> 2015-11-28 14:09:13,572 FATAL 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
> encountered while tailing edits. Shutting down standby NN.
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
>   ... 9 more
> 2015-11-28 14:09:13,574 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> 2015-11-28 14:09:13,575 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: 
> I found record in Edits, but I don't know how this record generated
> 
> OP_TIMES
> 
>   14774180
>   0
>   /.reserved/.inodes/2346114
>   -1
>   1448692924700
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-11-30 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031939#comment-15031939
 ] 

Daryn Sharp commented on HDFS-7435:
---

[~liuml07], good catch!  Not sure how I accidentally did that and I don't know 
why the test still passes.  I think your proposed fix is correct.  I know 
extremely little of the benchmark tool so I'd be unable to add a test (for the 
test!) in a timely manner, so if you'd like the credit for the bug then I'm 
fine with you doing the jira.

> PB encoding of block reports is very inefficient
> 
>
> Key: HDFS-7435
> URL: https://issues.apache.org/jira/browse/HDFS-7435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable

2015-11-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031966#comment-15031966
 ] 

Kihwal Lee commented on HDFS-9479:
--

Yes It looked familiar. I think it is a dupe.

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network 
> unstable
> 
>
> Key: HDFS-9479
> URL: https://issues.apache.org/jira/browse/HDFS-9479
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
>Reporter: Bob
>Assignee: Bob
>Priority: Blocker
>
> {code}
> Java stack information for the threads listed above:
> ===
> "Thread-1":
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228)
>   - waiting to lock <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765)
>   - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242)
>   - locked <0xd593aed0> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd593ae88> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   - locked <0xd55d9678> (a java.lang.Object)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> "LeaseRenewer:hdfs@hacluster:8020":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720)
>   - waiting to lock <0xd5c1a860> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465)
>   - locked <0xd5c3c868> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75)
>   at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9038) Reserved space is erroneously counted towards non-DFS used.

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032253#comment-15032253
 ] 

Hadoop QA commented on HDFS-9038:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 44s 
{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 48s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m 16s 
{color} | {color:red} hadoop-hdfs-project-jdk1.8.0_66 with JDK v1.8.0_66 
generated 3 new issues (was 49, now 49). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 17m 54s 
{color} | {color:red} hadoop-hdfs-project-jdk1.7.0_85 with JDK v1.7.0_85 
generated 3 new issues (was 51, now 51). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 35s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-hdfs-project (total was 293, now 295). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 115m 25s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 48s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 121m 39s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | 

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-11-30 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032285#comment-15032285
 ] 

Mingliang Liu commented on HDFS-7435:
-

Thanks for your comment, [~daryn] and [~shv]. I started to work on 
{{NNThroughputBenchmark}} just recently and knew nothing about it before that. 
If the empty block report list in this patch was not intensional, I think I 
don't need more context of this patch. Sure I'd like to make the change as we 
discussed above. The jira is [HDFS-9484]. Let's continue further discussion on 
the fix there.

The reason why the unit tests could pass may be that the 
{{TestNNThroughputBenchmark}} is rather a driver to run the benchmark with 
default parameters than a real unit test that asserts expected behavior for 
different scenarios. If we need a sophisticated unit test, perhaps we can 
address it separately.

> PB encoding of block reports is very inefficient
> 
>
> Key: HDFS-7435
> URL: https://issues.apache.org/jira/browse/HDFS-7435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9273) ACLs on root directory may be lost after NN restart

2015-11-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032262#comment-15032262
 ] 

Sangjin Lee commented on HDFS-9273:
---

+1 SGTM

> ACLs on root directory may be lost after NN restart
> ---
>
> Key: HDFS-9273
> URL: https://issues.apache.org/jira/browse/HDFS-9273
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: HDFS-9273.001.patch, HDFS-9273.002.patch
>
>
> After restarting namenode, the ACLs on the root directory ("/") may be lost 
> if it's rolled over to fsimage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032264#comment-15032264
 ] 

Sangjin Lee commented on HDFS-9470:
---

+1 SGTM

> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports

2015-11-30 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-9484:
---

 Summary: NNThroughputBenchmark$BlockReportStats should not send 
empty block reports
 Key: HDFS-9484
 URL: https://issues.apache.org/jira/browse/HDFS-9484
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Mingliang Liu
Assignee: Mingliang Liu


In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
{{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should actually 
construct the block report list by encoding generated {{blocks}} in test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports

2015-11-30 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9484:

Description: In 
{{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
{{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
the block report list by encoding generated {{blocks}} in test.  (was: In 
{{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
{{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should actually 
construct the block report list by encoding generated {{blocks}} in test.)

> NNThroughputBenchmark$BlockReportStats should not send empty block reports
> --
>
> Key: HDFS-9484
> URL: https://issues.apache.org/jira/browse/HDFS-9484
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
> {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
> the block report list by encoding generated {{blocks}} in test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8871) Decommissioning of a node with a failed volume may not start

2015-11-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-8871:
-
Target Version/s: 2.7.3, 2.6.4  (was: 2.7.3, 2.6.3)

> Decommissioning of a node with a failed volume may not start
> 
>
> Key: HDFS-8871
> URL: https://issues.apache.org/jira/browse/HDFS-8871
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Critical
>
> Since staleness may not be properly cleared, a node with a failed volume may 
> not actually get scanned for block replication. Nothing is being replicated 
> from these nodes.
> This bug does not manifest unless the datanode has a unique storage ID per 
> volume. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8871) Decommissioning of a node with a failed volume may not start

2015-11-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032172#comment-15032172
 ] 

Junping Du commented on HDFS-8871:
--

Moving this to 2.6.4 as no update for a while.

> Decommissioning of a node with a failed volume may not start
> 
>
> Key: HDFS-8871
> URL: https://issues.apache.org/jira/browse/HDFS-8871
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Critical
>
> Since staleness may not be properly cleared, a node with a failed volume may 
> not actually get scanned for block replication. Nothing is being replicated 
> from these nodes.
> This bug does not manifest unless the datanode has a unique storage ID per 
> volume. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032178#comment-15032178
 ] 

Haohui Mai commented on HDFS-8791:
--

Marking it as a critical bug of 2.6.3.

I think it's important to cherry-pick this patch to the 2.6 line to avoid 
serious performance degradation.

[~djp] what do you think?


> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8791:
-
Target Version/s: 2.6.3

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032190#comment-15032190
 ] 

Kihwal Lee commented on HDFS-8791:
--

bq. Marking it as a critical bug of 2.6.3.
If you want to pull this in 2.6.3, it might make sense to push it for 2.7.2. If 
2.6.3 comes out earlier than 2.7.3, we will be creating a version of 2.6 that 
cannot be upgraded to the latest 2.7. [~vinodkv], I think 2.6 and 2.7 release 
managers should coordinate.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available

2015-11-30 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032200#comment-15032200
 ] 

Yongjun Zhang commented on HDFS-9474:
-

Hi John,

Thanks for the patch. Two minor:
# Suggest to switch from {{System.println}} to {{LOG.info}}, {{LOG.warn}}, etc.
# For the stack trace printing, suggest to do {{LOG.warn("Error when running " 
+ scmd, e)}}

Thanks.





> TestPipelinesFailover would fail if ifconfig is not available
> -
>
> Key: HDFS-9474
> URL: https://issues.apache.org/jira/browse/HDFS-9474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
> Attachments: HDFS-9474.001.patch
>
>
> HDFS-6693 introduced some debug message to debug why when 
> TestPipelinesFailover fails. 
> HDFS-9438 restricted the debug message to Linux/Mac/Solaris.  However, the 
> test would fail when printing debug message if "ifconfig" command is not 
> available in certain environment.
> This is not quite right. The test should not fail due to the debug message 
> printing. We should catch any exception thrown from the code that prints 
> debug message, and issue a warning message. 
> Suggest to make this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032170#comment-15032170
 ] 

Haohui Mai commented on HDFS-9470:
--

Looks like it is an important fix but it has relatively low risks. I think it 
is beneficial to put it into 2.6.3.

The patch looks good to me overall. Kicking off another round of Jenkins run.

> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032181#comment-15032181
 ] 

Xiao Chen commented on HDFS-9470:
-

Thanks [~wheat9] for the comment. I'll watch out for the Jenkins result.

> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032187#comment-15032187
 ] 

Junping Du commented on HDFS-8791:
--

bq. I think it's important to cherry-pick this patch to the 2.6 line to avoid 
serious performance degradation. Junping Du what do you think?
+1. In case we don't have any compatible issue.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available

2015-11-30 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-9474:
-
Attachment: HDFS-9474.002.patch

> TestPipelinesFailover would fail if ifconfig is not available
> -
>
> Key: HDFS-9474
> URL: https://issues.apache.org/jira/browse/HDFS-9474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
> Attachments: HDFS-9474.001.patch, HDFS-9474.002.patch
>
>
> HDFS-6693 introduced some debug message to debug why when 
> TestPipelinesFailover fails. 
> HDFS-9438 restricted the debug message to Linux/Mac/Solaris.  However, the 
> test would fail when printing debug message if "ifconfig" command is not 
> available in certain environment.
> This is not quite right. The test should not fail due to the debug message 
> printing. We should catch any exception thrown from the code that prints 
> debug message, and issue a warning message. 
> Suggest to make this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9469) DiskBalancer : Add Planner

2015-11-30 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9469:
---
Attachment: HDFS-9469-HDFS-1312.001.patch

Attaching patch for code review. I will submit the patch after HDFS-9449 is 
submitted. 

> DiskBalancer : Add Planner 
> ---
>
> Key: HDFS-9469
> URL: https://issues.apache.org/jira/browse/HDFS-9469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9469-HDFS-1312.001.patch
>
>
> Disk Balancer reads the cluster data and then creates a plan for the data 
> moves based on the snap-shot of the data read from the nodes. This plan is 
> later submitted to data nodes for execution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6363) Improve concurrency while checking inclusion and exclusion of datanodes

2015-11-30 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6363:
---
Attachment: HDFS-6363-003.patch

Attaching the patch after fixing checkstyle issues. 

> Improve concurrency while checking inclusion and exclusion of datanodes
> ---
>
> Key: HDFS-6363
> URL: https://issues.apache.org/jira/browse/HDFS-6363
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6363-002.patch, HDFS-6363-003.patch, HDFS-6363.patch
>
>
> HostFileManager holds two effectively immutable objects - includes and 
> excludes. These two objects can be safely published together using a volatile 
> container instead of synchronizing for all mutators and accessors.
> This improves the concurrency while using HostFileManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException

2015-11-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032355#comment-15032355
 ] 

Colin Patrick McCabe commented on HDFS-9429:


This looks good.  Just one comment, though: can we decrease the 100 ms polling 
timeout in {{MiniJournalCluster#waitActive}} to 50 ms?

> Tests in TestDFSAdminWithHA intermittently fail with EOFException
> -
>
> Key: HDFS-9429
> URL: https://issues.apache.org/jira/browse/HDFS-9429
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9429.001.patch, HDFS-9429.002.patch, 
> HDFS-9429.reproduce
>
>
> I have seen this fail a handful of times for {{testMetaSave}}, but from my 
> understanding this is from {{setUpHaCluster}} so theoretically it could fail 
> for any cases in the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9269) Need to update the documentation and wrapper for fuse-dfs

2015-11-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9269 started by Wei-Chiu Chuang.
-
> Need to update the documentation and wrapper for fuse-dfs
> -
>
> Key: HDFS-9269
> URL: https://issues.apache.org/jira/browse/HDFS-9269
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9269.001.patch, HDFS-9269.002.patch
>
>
> To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the 
> wrapper script of fuse-dfs, but found them super outdated. (the wrapper was 
> last updated four years ago, and the hadoop project layout has dramatically 
> changed since then). I am creating this JIRA to track the status of the 
> update.
> There are quite a few external blogs/discussion threads floating around the 
> internet which talked about how to update the scripts, but no one took the 
> time to update them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9269) Need to update the documentation and wrapper for fuse-dfs

2015-11-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9269:
--
Attachment: HDFS-9269.002.patch

Rev02: Made the wrapper more self contained. Still need more testing (i.e. 
install it on a pristine machine) to make sure it works out of box.

> Need to update the documentation and wrapper for fuse-dfs
> -
>
> Key: HDFS-9269
> URL: https://issues.apache.org/jira/browse/HDFS-9269
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9269.001.patch, HDFS-9269.002.patch
>
>
> To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the 
> wrapper script of fuse-dfs, but found them super outdated. (the wrapper was 
> last updated four years ago, and the hadoop project layout has dramatically 
> changed since then). I am creating this JIRA to track the status of the 
> update.
> There are quite a few external blogs/discussion threads floating around the 
> internet which talked about how to update the scripts, but no one took the 
> time to update them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available

2015-11-30 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032383#comment-15032383
 ] 

Yongjun Zhang commented on HDFS-9474:
-

Thanks for the new rev John. +1 on rev 002 pending jenkins.


> TestPipelinesFailover would fail if ifconfig is not available
> -
>
> Key: HDFS-9474
> URL: https://issues.apache.org/jira/browse/HDFS-9474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
> Attachments: HDFS-9474.001.patch, HDFS-9474.002.patch
>
>
> HDFS-6693 introduced some debug message to debug why when 
> TestPipelinesFailover fails. 
> HDFS-9438 restricted the debug message to Linux/Mac/Solaris.  However, the 
> test would fail when printing debug message if "ifconfig" command is not 
> available in certain environment.
> This is not quite right. The test should not fail due to the debug message 
> printing. We should catch any exception thrown from the code that prints 
> debug message, and issue a warning message. 
> Suggest to make this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException

2015-11-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032411#comment-15032411
 ] 

Xiao Chen commented on HDFS-9429:
-

Thanks Colin for the comment! I'd love to make improvements but could you 
explain your concern here? Is this to make {{waitActive}} to finish sooner and 
reduce the overall wait time?

> Tests in TestDFSAdminWithHA intermittently fail with EOFException
> -
>
> Key: HDFS-9429
> URL: https://issues.apache.org/jira/browse/HDFS-9429
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9429.001.patch, HDFS-9429.002.patch, 
> HDFS-9429.reproduce
>
>
> I have seen this fail a handful of times for {{testMetaSave}}, but from my 
> understanding this is from {{setUpHaCluster}} so theoretically it could fail 
> for any cases in the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032451#comment-15032451
 ] 

Hadoop QA commented on HDFS-9470:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 49s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 46s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 22s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 180m 24s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 |
| JDK v1.7.0_85 Failed junit tests | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure000 |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  

[jira] [Updated] (HDFS-9371) Code cleanup for DatanodeManager

2015-11-30 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9371:

Attachment: HDFS-9371.002.patch

> Code cleanup for DatanodeManager
> 
>
> Key: HDFS-9371
> URL: https://issues.apache.org/jira/browse/HDFS-9371
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-9371.000.patch, HDFS-9371.001.patch, 
> HDFS-9371.002.patch
>
>
> Some code cleanup for DatanodeManager. The main changes include:
> # make the synchronization of {{datanodeMap}} and 
> {{datanodesSoftwareVersions}} consistent
> # remove unnecessary lock in {{handleHeartbeat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.

2015-11-30 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9267:

Attachment: HDFS-9267.04.patch

It fixes the test failures for JDK 7. 
Upload the patch to trigger a new jenkins. 

> TestDiskError should get stored replicas through FsDatasetTestUtils.
> 
>
> Key: HDFS-9267
> URL: https://issues.apache.org/jira/browse/HDFS-9267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, 
> HDFS-9267.02.patch, HDFS-9267.03.patch, HDFS-9267.04.patch
>
>
> {{TestDiskError#testReplicationError}} scans local directories to verify 
> blocks and metadata files, which leaks the details of {{FsDataset}} 
> implementation. 
> This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications

2015-11-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032977#comment-15032977
 ] 

Jing Zhao commented on HDFS-9381:
-

bq. I'm not fully following the above. Jing do you mind elaborating it a little 
bit?

Sorry for the confusion. As commented by Walter, "if DN_2 fails soon after 
DN_1, only neededReplications updated", i.e., the records in neededReplications 
will have enough time to be updated so that they can indicate the block groups 
are missing 2 internal blocks, before the reported issue happens.



> When same block came for replication for Striped mode, we can move that block 
> to PendingReplications
> 
>
> Key: HDFS-9381
> URL: https://issues.apache.org/jira/browse/HDFS-9381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, 
> HDFS-9381.00.patch, HDFS-9381.01.patch
>
>
> Currently I noticed that we are just returning null if block already exists 
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>   if (pendingNum > 0) {
> // Wait the previous recovery to finish.
> return null;
>   }
> {code}
>  Here if we just return null and if neededReplications contains only fewer 
> blocks(basically by default if less than numliveNodes*2), then same blocks 
> can be picked again from neededReplications from next loop as we are not 
> removing element from neededReplications. Since this replication process need 
> to take fsnamesystmem lock and do, we may spend some time unnecessarily in 
> every loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing pendingReplications 
> for this block and remove from neededReplications? and also another point to 
> consider here is, to add into pendingReplications, generally we need target 
> and it is nothing but to which node we issued replication command. Later when 
> after replication success and DN reported it, block will be removed from 
> pendingReplications from NN addBlock. 
>  So since this is newly picked block from neededReplications, we would not 
> have selected target yet. So which target to be passed to pendingReplications 
> if we add this block? One Option I am thinking is, how about just passing 
> srcNode itself as target for this special condition? So, anyway if the block 
> is really missed, srcNode will not report it. So this block will not be 
> removed from pending replications, so that when it is timed out, it will be 
> considered for replication again and that time it will find actual target to 
> replicate while processing as part of regular replication flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033014#comment-15033014
 ] 

Hudson commented on HDFS-9470:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #652 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/652/])
HDFS-9470. Encryption zone on root not loaded from fsimage after NN (wang: rev 
9b8e50b424d060e16c1175b1811e7abc476e2468)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java


> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.7.2, 2.6.3
>
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9477) namenode starts failed:FSEditLogLoader: Encountered exception on operation TimesOp

2015-11-30 Thread aplee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aplee updated HDFS-9477:

Description: 
backup namenode start failed, log below:
2015-11-28 14:09:13,462 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, 
atime=1448692924700, opCode=OP_TIMES, txid=14774180]
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
2015-11-28 14:09:13,572 FATAL 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
encountered while tailing edits. Shutting down standby NN.
java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
... 9 more
2015-11-28 14:09:13,574 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
2015-11-28 14:09:13,575 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG: 

I found record in Edits, but I don't know how this record generated


OP_TIMES

  14774180
  0
  /.reserved/.inodes/2346114
  -1
  1448692924700

  

  was:
backup name start failed, log below:
2015-11-28 14:09:13,462 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, 
atime=1448692924700, opCode=OP_TIMES, txid=14774180]
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832)
at 

[jira] [Commented] (HDFS-6363) Improve concurrency while checking inclusion and exclusion of datanodes

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032603#comment-15032603
 ] 

Hadoop QA commented on HDFS-6363:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs (total was 5, now 7). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 52s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 25s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 13s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030 |
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.hdfs.TestReadStripedFileWithDecoding |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | 

[jira] [Updated] (HDFS-6363) Improve concurrency while checking inclusion and exclusion of datanodes

2015-11-30 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-6363:
---
Attachment: HDFS-6363-004.patch

> Improve concurrency while checking inclusion and exclusion of datanodes
> ---
>
> Key: HDFS-6363
> URL: https://issues.apache.org/jira/browse/HDFS-6363
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6363-002.patch, HDFS-6363-003.patch, 
> HDFS-6363-004.patch, HDFS-6363.patch
>
>
> HostFileManager holds two effectively immutable objects - includes and 
> excludes. These two objects can be safely published together using a volatile 
> container instead of synchronizing for all mutators and accessors.
> This improves the concurrency while using HostFileManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032643#comment-15032643
 ] 

Jing Zhao commented on HDFS-9129:
-

The latest patch looks pretty good to me. The only minor comment is the 
following TODO:
{code}
  // TODO delete the following line?
  startSecretManagerIfNecessary();
{code}
I think we can remove this line since it is already called in 
{{BlockManagerSafeMode#leaveSafeMode}}.

Any other comments, [~daryn] and [~wheat9]?

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch, HDFS-9129.024.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-11-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032689#comment-15032689
 ] 

Allen Wittenauer commented on HDFS-7984:


If I understand the code change correctly, I'm sort of surprised this doesn't 
work:

on user account1:

{code}
hdfs fetchdt /tmp/token
chmod a+r /tmp/token
{code}

on user account2:

{code}
hadoop fs -Dhadoop.token.file=/tmp/token -ls /user/account1
{code}

Both hdfs and webhdfs are failing this simple test.  

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, 
> HDFS-7984.003.patch, HDFS-7984.004.patch, HDFS-7984.005.patch, 
> HDFS-7984.006.patch, HDFS-7984.007.patch, HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6533) intermittent org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest failure

2015-11-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-6533:
--
Status: Patch Available  (was: Open)

> intermittent 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest
>  failure 
> --
>
> Key: HDFS-6533
> URL: https://issues.apache.org/jira/browse/HDFS-6533
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-6533.001.patch, HDFS-6533.002.patch
>
>
> Per https://builds.apache.org/job/Hadoop-Hdfs-trunk/1774/testReport, the 
> following test failed. However, local rerun is successful.
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality
> Error Message
> Wanted but not invoked:
> datanodeProtocolClientSideTranslatorPB.registerDatanode(
> 
> );
> -> at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175)
> Actually, there were zero interactions with this mock.
> Stacktrace
> org.mockito.exceptions.verification.WantedButNotInvoked: 
> Wanted but not invoked:
> datanodeProtocolClientSideTranslatorPB.registerDatanode(
> 
> );
> -> at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175)
> Actually, there were zero interactions with this mock.
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175)
> Standard Output
> 2014-06-14 12:42:08,723 INFO  datanode.DataNode 
> (SimulatedFSDataset.java:registerMBean(968)) - Registered FSDatasetState MBean
> 2014-06-14 12:42:08,730 INFO  datanode.DataNode 
> (BPServiceActor.java:run(805)) - Block pool  (Datanode Uuid 
> unassigned) service to 0.0.0.0/0.0.0.0:0 starting to offer service
> 2014-06-14 12:42:08,730 DEBUG datanode.DataNode 
> (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool  
> (Datanode Uuid unassigned) service to 0.0.0.0/0.0.0.0:0 received 
> versionRequest response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid
> 2014-06-14 12:42:08,731 INFO  datanode.DataNode 
> (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid 
> null) service to 0.0.0.0/0.0.0.0:0 beginning handshake with NN
> 2014-06-14 12:42:08,731 INFO  datanode.DataNode 
> (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid 
> (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0 successfully registered 
> with NN
> 2014-06-14 12:42:08,732 INFO  datanode.DataNode 
> (BPServiceActor.java:offerService(637)) - For namenode 0.0.0.0/0.0.0.0:0 
> using DELETEREPORT_INTERVAL of 30 msec  BLOCKREPORT_INTERVAL of 
> 2160msec CACHEREPORT_INTERVAL of 1msec Initial delay: 0msec; 
> heartBeatInterval=3000
> 2014-06-14 12:42:08,732 DEBUG datanode.DataNode 
> (BPServiceActor.java:sendHeartBeat(562)) - Sending heartbeat with 1 storage 
> reports from service actor: Block pool fake bpid (Datanode Uuid null) service 
> to 0.0.0.0/0.0.0.0:0
> 2014-06-14 12:42:08,734 INFO  datanode.DataNode 
> (BPServiceActor.java:blockReport(498)) - Sent 1 blockreports 0 blocks total. 
> Took 1 msec to generate and 0 msecs for RPC and NN processing.  Got back 
> commands none
> 2014-06-14 12:42:08,738 INFO  datanode.DataNode 
> (BPServiceActor.java:run(805)) - Block pool fake bpid (Datanode Uuid null) 
> service to 0.0.0.0/0.0.0.0:1 starting to offer service
> 2014-06-14 12:42:08,739 DEBUG datanode.DataNode 
> (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool fake bpid 
> (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 received versionRequest 
> response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid
> 2014-06-14 12:42:08,739 INFO  datanode.DataNode 
> (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid 
> null) service to 0.0.0.0/0.0.0.0:1 beginning handshake with NN
> 2014-06-14 12:42:08,740 INFO  datanode.DataNode 
> (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid 
> (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 successfully registered 
> with NN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9487) libhdfs++ Enable builds with no compiler optimizations

2015-11-30 Thread James Clampffer (JIRA)
James Clampffer created HDFS-9487:
-

 Summary: libhdfs++ Enable builds with no compiler optimizations
 Key: HDFS-9487
 URL: https://issues.apache.org/jira/browse/HDFS-9487
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: James Clampffer
Assignee: James Clampffer


The default build configuration uses -02 -g .  To make 
debugging easier it would be really nice to be able to produce builds with -O0.

I haven't found an existing flag to pass to maven or cmake to accomplish this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-11-30 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032740#comment-15032740
 ] 

Chris Trezzo commented on HDFS-8791:


I am finishing up the unit test and will post it later today.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications

2015-11-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032807#comment-15032807
 ] 

Zhe Zhang commented on HDFS-9381:
-

Thanks Jing for the comment.

Let's consider this case:
# Cluster has 100 nodes
# DN_1 and DN_2 failed
# They are on different racks
# They happen to share 1000 striped blocks and 1000 contiguous blocks (we can 
easily scale up the calculated numbers for n x 1000 blocks). So there are 2000 
striped internal blocks, and 2000 contiguous block replicas missing.

So in each iteration ReplicationMonitor tries to pickup 200 items. Without the 
change, it will be 100 striped and 100 contiguous on average. Assuming EC 
recovery work takes longer than 3 seconds ({{replicationRecheckInterval}}), 
then the 2nd iteration will pickup about 5 invalid striped items (being 
recovered). If EC recovery work takes long enough, then the 3rd round will 
pickup about 10 (2/18) invalid striped items, 4th round 18 invalid items. This 
way the replication work for the lost contiguous replicas will take 20 x 3 = 30 
seconds to distributed to DNs.

With the change, 2nd round will pickup 95 striped items and 105 contiguous 
items, 3rd round 110 contiguous items,  It's tricky to get very accurate, 
but seems we can save a few 3-second cycles. 

[~umamaheswararao] Does the example itself make sense to you? If so, how should 
we calculate the saving in locking time?

bq. it is also possible that because of the longer processing time, there is 
higher chance for the striped blocks to be updated in the UC queue before being 
processed by the replication monitor for the first time
I'm not fully following the above. Jing do you mind elaborating it a little bit?


> When same block came for replication for Striped mode, we can move that block 
> to PendingReplications
> 
>
> Key: HDFS-9381
> URL: https://issues.apache.org/jira/browse/HDFS-9381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, 
> HDFS-9381.00.patch, HDFS-9381.01.patch
>
>
> Currently I noticed that we are just returning null if block already exists 
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>   if (pendingNum > 0) {
> // Wait the previous recovery to finish.
> return null;
>   }
> {code}
>  Here if we just return null and if neededReplications contains only fewer 
> blocks(basically by default if less than numliveNodes*2), then same blocks 
> can be picked again from neededReplications from next loop as we are not 
> removing element from neededReplications. Since this replication process need 
> to take fsnamesystmem lock and do, we may spend some time unnecessarily in 
> every loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing pendingReplications 
> for this block and remove from neededReplications? and also another point to 
> consider here is, to add into pendingReplications, generally we need target 
> and it is nothing but to which node we issued replication command. Later when 
> after replication success and DN reported it, block will be removed from 
> pendingReplications from NN addBlock. 
>  So since this is newly picked block from neededReplications, we would not 
> have selected target yet. So which target to be passed to pendingReplications 
> if we add this block? One Option I am thinking is, how about just passing 
> srcNode itself as target for this special condition? So, anyway if the block 
> is really missed, srcNode will not report it. So this block will not be 
> removed from pending replications, so that when it is timed out, it will be 
> considered for replication again and that time it will find actual target to 
> replicate while processing as part of regular replication flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2015-11-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032805#comment-15032805
 ] 

Chris Nauroth commented on HDFS-8986:
-

I don't think we can change the default behavior of the commands, at least not 
within 2.x, on grounds of backwards-compatibility.  It's possible that users 
already depend on inclusion of snapshot contents in the results of these 
commands.  Adding a new option for filtering out snapshot contents would be a 
backwards-compatible change though.

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Jagadesh Kiran N
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9442) Move block replication logic from BlockManager to a new class ReplicationManager

2015-11-30 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9442:

Attachment: HDFS-9442.004.patch

Per offline discussion with [~wheat9] and [~jingzhao], the v4 patch makes the 
{{ReplicationManager#removeBlockFromExcessReplicateMap()}} and 
{{ReplicationManager#isBlockExcessOnNode}} accept {{BlockInfo}} object instead 
of {{Block}}. This patch cherry picked [HDFS-9485].

> Move block replication logic from BlockManager to a new class 
> ReplicationManager
> 
>
> Key: HDFS-9442
> URL: https://issues.apache.org/jira/browse/HDFS-9442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9442.000.patch, HDFS-9442.001.patch, 
> HDFS-9442.002.patch, HDFS-9442.003.patch, HDFS-9442.004.patch
>
>
> Currently the {{BlockManager}} is managing all replication logic for over- , 
> under- and mis-replicated blocks. This jira proposes to move that code to a 
> new class named {{ReplicationManager}} for cleaner code logic, shorter source 
> files, and easier lock separating work in future.
> The {{ReplicationManager}} is a package local class, providing 
> {{BlockManager}} with methods that accesses its internal data structures of 
> replication queue. Meanwhile, the class maintains the lifecycle of 
> {{replicationThread}} and {{replicationQueuesInitializer}} daemon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032900#comment-15032900
 ] 

Akira AJISAKA commented on HDFS-9470:
-

Hi [~xiaochen], do we need to commit this patch to branch-2.8 as well?

> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.7.2, 2.6.3
>
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032924#comment-15032924
 ] 

Andrew Wang commented on HDFS-9470:
---

Yea my bad on forgetting branch-2.8, I just committed it there too. Thanks 
[~ajisakaa] for the catch!

> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.7.2, 2.6.3
>
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications

2015-11-30 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032965#comment-15032965
 ] 

Walter Su commented on HDFS-9381:
-

1. need to lock on {{readyForReplications}}.
{code}
+  if (unscheduledPendingReplications.remove(block)) {
+readyForReplications.add(block);
+  }
{code}

2. Here We can have some log.
{code}
   if (pendingNum > 0) {
 // Wait the previous recovery to finish.
+pendingReplications.addToUnscheduledPendingReplication(block);
+neededReplications.remove(block, priority);
 return null;
{code}

3. Here need to check if block is already in {{readyForReplications}}. 
Otherwise it's possible the block appears in {{readyForReplications}}, and 
re-processed.
{code}
addToUnscheduledPendingReplication(BlockInfo block) {
{code}

Speaking of the case [~zhz] mentioned above.  Assume DN_1 has 1m blocks 
totally. So it takes 5000 iter to process all, which means about 4 hrs. If DN_2 
fails soon after DN_1, only {{neededReplications}} updated. If DN_2 fails long 
after DN_1, the previous task already finished so we schedule a new task.

> When same block came for replication for Striped mode, we can move that block 
> to PendingReplications
> 
>
> Key: HDFS-9381
> URL: https://issues.apache.org/jira/browse/HDFS-9381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, 
> HDFS-9381.00.patch, HDFS-9381.01.patch
>
>
> Currently I noticed that we are just returning null if block already exists 
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>   if (pendingNum > 0) {
> // Wait the previous recovery to finish.
> return null;
>   }
> {code}
>  Here if we just return null and if neededReplications contains only fewer 
> blocks(basically by default if less than numliveNodes*2), then same blocks 
> can be picked again from neededReplications from next loop as we are not 
> removing element from neededReplications. Since this replication process need 
> to take fsnamesystmem lock and do, we may spend some time unnecessarily in 
> every loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing pendingReplications 
> for this block and remove from neededReplications? and also another point to 
> consider here is, to add into pendingReplications, generally we need target 
> and it is nothing but to which node we issued replication command. Later when 
> after replication success and DN reported it, block will be removed from 
> pendingReplications from NN addBlock. 
>  So since this is newly picked block from neededReplications, we would not 
> have selected target yet. So which target to be passed to pendingReplications 
> if we add this block? One Option I am thinking is, how about just passing 
> srcNode itself as target for this special condition? So, anyway if the block 
> is really missed, srcNode will not report it. So this block will not be 
> removed from pending replications, so that when it is timed out, it will be 
> considered for replication again and that time it will find actual target to 
> replicate while processing as part of regular replication flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9477) namenode starts failed:FSEditLogLoader: Encountered exception on operation TimesOp

2015-11-30 Thread aplee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033010#comment-15033010
 ] 

aplee commented on HDFS-9477:
-

Thanks for your reply.
Which file do you mean by opening the file? /.reserved/.inodes/2346114?
I mounted HDFS on linux using nfs gateway, and then shared it with samba. So we 
can open the files in HDFS without downloading it on windows.
About thirty records of OP_TIMES of /.reserved/.inodes/ occurs in edits in 
about a week after that.I think that may be related.
I know little about the file descriptor-ish feature, and will learn something 
about it


> namenode starts failed:FSEditLogLoader: Encountered exception on operation 
> TimesOp
> --
>
> Key: HDFS-9477
> URL: https://issues.apache.org/jira/browse/HDFS-9477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
> Environment: Ubuntu 12.04.1 LTS, java version "1.7.0_79"
>Reporter: aplee
>Assignee: aplee
>
> backup name start failed, log below:
> 2015-11-28 14:09:13,462 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, 
> atime=1448692924700, opCode=OP_TIMES, txid=14774180]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> 2015-11-28 14:09:13,572 FATAL 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
> encountered while tailing edits. Shutting down standby NN.
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
>   ... 9 more
> 2015-11-28 14:09:13,574 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> 2015-11-28 14:09:13,575 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: 
> I found record in Edits, but I don't know how this record generated
> 

[jira] [Commented] (HDFS-9336) deleteSnapshot throws NPE when snapshotname is null

2015-11-30 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033009#comment-15033009
 ] 

Akira AJISAKA commented on HDFS-9336:
-

+1. I ran all the failed test on JDK7 and they passed locally.

> deleteSnapshot throws NPE when snapshotname is null
> ---
>
> Key: HDFS-9336
> URL: https://issues.apache.org/jira/browse/HDFS-9336
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9336-002.patch, HDFS-9336-003.patch, 
> HDFS-9336-004.patch, HDFS-9336.patch
>
>
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$DeleteSnapshotRequestProto$Builder.setSnapshotName(ClientNamenodeProtocolProtos.java:17509)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.deleteSnapshot(ClientNamenodeProtocolTranslatorPB.java:1005)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy15.deleteSnapshot(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.deleteSnapshot(DFSClient.java:2106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1660)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.deleteSnapshot(DistributedFileSystem.java:1677)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testWebHdfsAllowandDisallowSnapshots(TestWebHDFS.java:380)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9038) Reserved space is erroneously counted towards non-DFS used.

2015-11-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032865#comment-15032865
 ] 

Chris Nauroth commented on HDFS-9038:
-

Thanks for the further reviews.  I'm catching up on patch v005 now.

# The latest {{getNonDfsUsed}} switched to using {{File#getFreeSpace}}.  
However, the {{getAvailable}} calculation uses {{File#getUsableSpace}} via 
{{DF#getAvailable}}.  The non-DFS used calculation prior to the HDFS-5215 patch 
also would have been using {{File#getUsableSpace}}.  I think we should stick 
with {{File#getUsableSpace}} here (or {{DF#getAvailable}} for symmetry with the 
pre-HDFS-5215 code).
# The latest {{getNonDfsUsed}} does not include {{reservedForReplicas}}.  I 
think it should, since the {{reservedForReplicas}} amount is effectively in use 
by HDFS.
# I think we should cap the returned value to 0 as a matter of defensive coding 
against negative values.  There could be a possibility of race conditions in 
between pulling the individual data items, resulting in an unexpected negative 
total.

> Reserved space is erroneously counted towards non-DFS used.
> ---
>
> Key: HDFS-9038
> URL: https://issues.apache.org/jira/browse/HDFS-9038
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9365) Balaner does not work with the HDFS-6376 HA setup

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032982#comment-15032982
 ] 

Hadoop QA commented on HDFS-9365:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 43s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 42s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12773635/h9365_20151120.patch |
| JIRA Issue | HDFS-9365 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9e495f2e2ec9 

[jira] [Updated] (HDFS-6533) intermittent org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest failure

2015-11-30 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-6533:
--
Attachment: HDFS-6533.003.patch

Thanks [~arpitagarwal] for the comments. Yes that looks to be a better idea.

I am attaching rev03 that follows your suggestion. 

> intermittent 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest
>  failure 
> --
>
> Key: HDFS-6533
> URL: https://issues.apache.org/jira/browse/HDFS-6533
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Yongjun Zhang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-6533.001.patch, HDFS-6533.002.patch, 
> HDFS-6533.003.patch
>
>
> Per https://builds.apache.org/job/Hadoop-Hdfs-trunk/1774/testReport, the 
> following test failed. However, local rerun is successful.
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality
> Error Message
> Wanted but not invoked:
> datanodeProtocolClientSideTranslatorPB.registerDatanode(
> 
> );
> -> at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175)
> Actually, there were zero interactions with this mock.
> Stacktrace
> org.mockito.exceptions.verification.WantedButNotInvoked: 
> Wanted but not invoked:
> datanodeProtocolClientSideTranslatorPB.registerDatanode(
> 
> );
> -> at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175)
> Actually, there were zero interactions with this mock.
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175)
> Standard Output
> 2014-06-14 12:42:08,723 INFO  datanode.DataNode 
> (SimulatedFSDataset.java:registerMBean(968)) - Registered FSDatasetState MBean
> 2014-06-14 12:42:08,730 INFO  datanode.DataNode 
> (BPServiceActor.java:run(805)) - Block pool  (Datanode Uuid 
> unassigned) service to 0.0.0.0/0.0.0.0:0 starting to offer service
> 2014-06-14 12:42:08,730 DEBUG datanode.DataNode 
> (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool  
> (Datanode Uuid unassigned) service to 0.0.0.0/0.0.0.0:0 received 
> versionRequest response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid
> 2014-06-14 12:42:08,731 INFO  datanode.DataNode 
> (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid 
> null) service to 0.0.0.0/0.0.0.0:0 beginning handshake with NN
> 2014-06-14 12:42:08,731 INFO  datanode.DataNode 
> (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid 
> (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0 successfully registered 
> with NN
> 2014-06-14 12:42:08,732 INFO  datanode.DataNode 
> (BPServiceActor.java:offerService(637)) - For namenode 0.0.0.0/0.0.0.0:0 
> using DELETEREPORT_INTERVAL of 30 msec  BLOCKREPORT_INTERVAL of 
> 2160msec CACHEREPORT_INTERVAL of 1msec Initial delay: 0msec; 
> heartBeatInterval=3000
> 2014-06-14 12:42:08,732 DEBUG datanode.DataNode 
> (BPServiceActor.java:sendHeartBeat(562)) - Sending heartbeat with 1 storage 
> reports from service actor: Block pool fake bpid (Datanode Uuid null) service 
> to 0.0.0.0/0.0.0.0:0
> 2014-06-14 12:42:08,734 INFO  datanode.DataNode 
> (BPServiceActor.java:blockReport(498)) - Sent 1 blockreports 0 blocks total. 
> Took 1 msec to generate and 0 msecs for RPC and NN processing.  Got back 
> commands none
> 2014-06-14 12:42:08,738 INFO  datanode.DataNode 
> (BPServiceActor.java:run(805)) - Block pool fake bpid (Datanode Uuid null) 
> service to 0.0.0.0/0.0.0.0:1 starting to offer service
> 2014-06-14 12:42:08,739 DEBUG datanode.DataNode 
> (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool fake bpid 
> (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 received versionRequest 
> response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid
> 2014-06-14 12:42:08,739 INFO  datanode.DataNode 
> (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid 
> null) service to 0.0.0.0/0.0.0.0:1 beginning handshake with NN
> 2014-06-14 12:42:08,740 INFO  datanode.DataNode 
> (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid 
> (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 successfully registered 
> with NN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032997#comment-15032997
 ] 

Hadoop QA commented on HDFS-9449:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
47s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s 
{color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
57s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 53s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 50s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s 
{color} | {color:red} Patch generated 57 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 180m 59s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.security.TestPermission |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.security.TestPermission |
\\
\\
|| Subsystem 

[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032998#comment-15032998
 ] 

Hadoop QA commented on HDFS-9429:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 48s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 55s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 140m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12774916/HDFS-9429.003.patch |
| JIRA Issue | HDFS-9429 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ef2f02a59f7c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (HDFS-9336) deleteSnapshot throws NPE when snapshotname is null

2015-11-30 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9336:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2, and branch-2.8. Thanks [~brahmareddy] for 
the contribution!

> deleteSnapshot throws NPE when snapshotname is null
> ---
>
> Key: HDFS-9336
> URL: https://issues.apache.org/jira/browse/HDFS-9336
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-9336-002.patch, HDFS-9336-003.patch, 
> HDFS-9336-004.patch, HDFS-9336.patch
>
>
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$DeleteSnapshotRequestProto$Builder.setSnapshotName(ClientNamenodeProtocolProtos.java:17509)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.deleteSnapshot(ClientNamenodeProtocolTranslatorPB.java:1005)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy15.deleteSnapshot(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.deleteSnapshot(DFSClient.java:2106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1660)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.deleteSnapshot(DistributedFileSystem.java:1677)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testWebHdfsAllowandDisallowSnapshots(TestWebHDFS.java:380)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available

2015-11-30 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-9474:
-
Status: Patch Available  (was: In Progress)

> TestPipelinesFailover would fail if ifconfig is not available
> -
>
> Key: HDFS-9474
> URL: https://issues.apache.org/jira/browse/HDFS-9474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
> Attachments: HDFS-9474.001.patch, HDFS-9474.002.patch
>
>
> HDFS-6693 introduced some debug message to debug why when 
> TestPipelinesFailover fails. 
> HDFS-9438 restricted the debug message to Linux/Mac/Solaris.  However, the 
> test would fail when printing debug message if "ifconfig" command is not 
> available in certain environment.
> This is not quite right. The test should not fail due to the debug message 
> printing. We should catch any exception thrown from the code that prints 
> debug message, and issue a warning message. 
> Suggest to make this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9485) Make BlockManager#removeFromExcessReplicateMap accept BlockInfo instead of Block

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032950#comment-15032950
 ] 

Hadoop QA commented on HDFS-9485:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 20s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 197m 3s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailure110 |
|   | hadoop.hdfs.TestFileCreationDelete |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.qjournal.TestSecureNNWithQJM |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | 

[jira] [Commented] (HDFS-6533) intermittent org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest failure

2015-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032969#comment-15032969
 ] 

Hadoop QA commented on HDFS-6533:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs (total was 23, now 24). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 55s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs introduced 1 new FindBugs 
issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 1s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 56s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 199m 40s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Increment of volatile field 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.registeredActors in 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPServiceActor,
 DatanodeRegistration)  At BPOfferService.java:in 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPServiceActor,
 DatanodeRegistration)  At BPOfferService.java:[line 371] |
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | 

[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-30 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.025.patch

Thanks [~jingzhao] for the review. I revisited the TODO comment and I also 
think we can remove it safely. Nice catch. The v25 patch is to address this.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors

2015-11-30 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033001#comment-15033001
 ] 

Anu Engineer commented on HDFS-9449:


None of the test failures are related to this patch.

> DiskBalancer : Add connectors
> -
>
> Key: HDFS-9449
> URL: https://issues.apache.org/jira/browse/HDFS-9449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9449-HDFS-1312.001.patch, 
> HDFS-9449-HDFS-1312.002.patch, HDFS-9449-HDFS-1312.003.patch
>
>
> Connectors allow disk balancer data models to connect to an existing cluster 
> - Namenode or to a json file which describes the cluster. This is used for 
> discovering the physical layout of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors

2015-11-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032650#comment-15032650
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9449:
---

- DBNameNodeConnector.connector should final.  Then, it will never null so that 
we don't need to check null for it.
- DBNameNodeConnector.clusterURI is not used.  Should we remove it?
- If two DiskBalancer's are running at the same time, would they somehow 
balance the same datanode?
{code}
// we don't care how many instances of disk balancers run.
// The admission is controlled at the data node, where we will
// execute only one plan at a given time.
NameNodeConnector.setWrite2IdFile(false);
{code}
- The code with
{code}
Preconditions.checkArgument(x != null);
{code}
should be replaced by
{code}
Preconditions.checkNotNull(x);
{code}
- Only the path in JsonNodeConnector.clusterURI is used.  Should clusterURI be 
replaced by something like clusterFilePath?




> DiskBalancer : Add connectors
> -
>
> Key: HDFS-9449
> URL: https://issues.apache.org/jira/browse/HDFS-9449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9449-HDFS-1312.001.patch, 
> HDFS-9449-HDFS-1312.002.patch
>
>
> Connectors allow disk balancer data models to connect to an existing cluster 
> - Namenode or to a json file which describes the cluster. This is used for 
> discovering the physical layout of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8957) Consolidate client striping input stream codes for stateful read and positional read

2015-11-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8957:

Fix Version/s: (was: HDFS-7285)

> Consolidate client striping input stream codes for stateful read and 
> positional read
> 
>
> Key: HDFS-8957
> URL: https://issues.apache.org/jira/browse/HDFS-8957
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-8957-v1.patch
>
>
> Currently we have different implementations for client striping read, having 
> both *StatefulStripeReader* and *PositionStripeReader*. I attempted to 
> consolidate the two implementations into one, and it results in much simpler 
> codes, and also better performance. Now in both read paths, it will:
> * Use pooled ByteBuffers, as currently stateful read does;
> * Read directly into application's buffer, as currently positional read does;
> * Try to align and merge multiple stripes, as currently positional read does;
> * Use *ECChunk* version decode API.
> The resultant *StripeReader* is approaching very near now to the ideal state 
> desired by next step, employing *ErasureCoder* API instead of 
> *RawErasureCoder* API.
> Will upload an initial patch to illustrate the rough change, even though it 
> depends on other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports

2015-11-30 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032704#comment-15032704
 ] 

Mingliang Liu commented on HDFS-9484:
-

Thanks [~cmccabe] for your confirmation. I updated the jira description to add 
another potential bug that makes the {{BlockReportStats}} send empty block 
reports. Would you help me check that out as well?

> NNThroughputBenchmark$BlockReportStats should not send empty block reports
> --
>
> Key: HDFS-9484
> URL: https://issues.apache.org/jira/browse/HDFS-9484
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> There are two potential bugs that make the 
> {{NNThroughputBenchmark$BlockReportStats}} send empty block reports.
> # In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
> {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
> the block report list by encoding generated {{blocks}} in test.
> # {{TinyDatanode#blocks}} is an empty ArrayList with initial capacity. In 
> {{TinyDatanode#addBlock()}} first statement, the {{if(nrBlocks == 
> blocks.size()) {}} will always be true. We should either fill the blocks with 
> dummy report in {{TinyDatanode()}} constructor, or use initial capacity 
> instead of {{blocks.size()}} in the above _if_ statement (we should replace 
> {{ArrayList#set}} with {{ArrayList#add}} as well).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8957) Consolidate client striping input stream codes for stateful read and positional read

2015-11-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8957:

Component/s: erasure-coding

> Consolidate client striping input stream codes for stateful read and 
> positional read
> 
>
> Key: HDFS-8957
> URL: https://issues.apache.org/jira/browse/HDFS-8957
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-8957-v1.patch
>
>
> Currently we have different implementations for client striping read, having 
> both *StatefulStripeReader* and *PositionStripeReader*. I attempted to 
> consolidate the two implementations into one, and it results in much simpler 
> codes, and also better performance. Now in both read paths, it will:
> * Use pooled ByteBuffers, as currently stateful read does;
> * Read directly into application's buffer, as currently positional read does;
> * Try to align and merge multiple stripes, as currently positional read does;
> * Use *ECChunk* version decode API.
> The resultant *StripeReader* is approaching very near now to the ideal state 
> desired by next step, employing *ErasureCoder* API instead of 
> *RawErasureCoder* API.
> Will upload an initial patch to illustrate the rough change, even though it 
> depends on other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart

2015-11-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032707#comment-15032707
 ] 

Xiao Chen commented on HDFS-9470:
-

Thanks [~andrew.wang] for committing and resolving cherry-pick conflicts. I 
reviewed the commits to 2.6 and 2.7 branches, LGTM +1.
Also thanks to everyone for the review and comments.

> Encryption zone on root not loaded from fsimage after NN restart
> 
>
> Key: HDFS-9470
> URL: https://issues.apache.org/jira/browse/HDFS-9470
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.7.2, 2.6.3
>
> Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, 
> HDFS-9470.003.patch
>
>
> When restarting namenode, the encryption zone for {{rootDir}} is not loaded 
> correctly from fsimage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications

2015-11-30 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032723#comment-15032723
 ] 

Jing Zhao commented on HDFS-9381:
-

Thanks for the discussion, Uma and Zhe!

bq. To determine whether the optimization justifies the added complexity, I 
think we can create a more concrete example.

yeah, I agree maybe a more concrete example and some perf numbers will help us 
understand the optimization better.

bq. I think besides reducing locking contention, this change also speeds up the 
recovery of non-striping blocks. E.g., when a rack fails, there could be a lot 
of striped block recovery work waiting. They could block regular recovery tasks.

When we have a lot of missing blocks/replicas (e.g., caused by DataNode 
failures or even rack failure), since in each iteration the replication monitor 
only handles limited number of blocks, some iterations may be wasted by 
checking this type of striped blocks. However, it is also possible that because 
of the longer processing time, there is higher chance for the striped blocks to 
be updated in the UC queue before being processed by the replication monitor 
for the first time. Also the striped blocks are more likely to be replicated 
across multiple racks, a single rack failure may only cause a single internal 
block missing for a striped block group. So feels like the scenarios are more 
complicated here.

> When same block came for replication for Striped mode, we can move that block 
> to PendingReplications
> 
>
> Key: HDFS-9381
> URL: https://issues.apache.org/jira/browse/HDFS-9381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, 
> HDFS-9381.00.patch, HDFS-9381.01.patch
>
>
> Currently I noticed that we are just returning null if block already exists 
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>   if (pendingNum > 0) {
> // Wait the previous recovery to finish.
> return null;
>   }
> {code}
>  Here if we just return null and if neededReplications contains only fewer 
> blocks(basically by default if less than numliveNodes*2), then same blocks 
> can be picked again from neededReplications from next loop as we are not 
> removing element from neededReplications. Since this replication process need 
> to take fsnamesystmem lock and do, we may spend some time unnecessarily in 
> every loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing pendingReplications 
> for this block and remove from neededReplications? and also another point to 
> consider here is, to add into pendingReplications, generally we need target 
> and it is nothing but to which node we issued replication command. Later when 
> after replication success and DN reported it, block will be removed from 
> pendingReplications from NN addBlock. 
>  So since this is newly picked block from neededReplications, we would not 
> have selected target yet. So which target to be passed to pendingReplications 
> if we add this block? One Option I am thinking is, how about just passing 
> srcNode itself as target for this special condition? So, anyway if the block 
> is really missed, srcNode will not report it. So this block will not be 
> removed from pending replications, so that when it is timed out, it will be 
> considered for replication again and that time it will find actual target to 
> replicate while processing as part of regular replication flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9487) libhdfs++ Enable builds with no compiler optimizations

2015-11-30 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032738#comment-15032738
 ] 

Bob Hansen commented on HDFS-9487:
--

http://unix.stackexchange.com/questions/187455/how-to-compile-without-optimizations-o0-using-cmake
 is a pattern to follow that might help.

> libhdfs++ Enable builds with no compiler optimizations
> --
>
> Key: HDFS-9487
> URL: https://issues.apache.org/jira/browse/HDFS-9487
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
>
> The default build configuration uses -02 -g .  To make 
> debugging easier it would be really nice to be able to produce builds with 
> -O0.
> I haven't found an existing flag to pass to maven or cmake to accomplish 
> this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-11-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032747#comment-15032747
 ] 

Zhe Zhang commented on HDFS-8647:
-

[~brahmareddy] [~mingma] [~walter.k.su] I wonder whether we should consider 
pushing this to branch-2.7 and branch-2.6. Maybe after the currently planned 
2.7.2 and 2.6.3 releases.

Doing so will enable the inclusion of bug fixes HDFS-9313 and HDFS-9314 in 
2.6.x and 2.7.x.

I worked with [~xiaochen] offline along this direction. The main challenge is 
from HDFS-8823, which should have been done in a feature branch but was 
committed to branch-2 -- so I don't think we should push that one to 
branch-2.6/2.7. If we reach an agreement here we can create a branch-2.6/2.7 
patch for this JIRA.

Thanks.

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, 
> HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, 
> HDFS-8647-008.patch, HDFS-8647-009.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2015-11-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032762#comment-15032762
 ] 

Chris Nauroth commented on HDFS-8986:
-

I see {{-du}} mentioned here.  I recommend that we support the same 
functionality for {{-count}} too.

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Jagadesh Kiran N
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2015-11-30 Thread Gautam Gopalakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032786#comment-15032786
 ] 

Gautam Gopalakrishnan commented on HDFS-8986:
-

I feel I should invert the request in this jira. The behaviour of {{-du}} 
should be the same regardless of whether snapshots are present or not. We 
should add a flag to include snapshot volume, rather than exclude. Possibly the 
same change is required for {{-count}}. [~cnauroth] and [~qwertymaniac], what 
do you think?

[~jagadesh.kiran] I can take this jira if you're busy with other work.

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Jagadesh Kiran N
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9228) libhdfs++ should respect NN retry configuration settings

2015-11-30 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032635#comment-15032635
 ] 

James Clampffer commented on HDFS-9228:
---

Looks good to me, just found 2 small things worth fixing.  I'm planning on 
committing HDFS-9144 before this, please let me know if it would be less 
painful to hold off on HDFS-9144 until this gets in.

-RetryPolicy should probably have a virtual destructor, or maybe a comment 
saying members can only be POD types.  I'd prefer the virtual destructor 
approach.
-In rpc_connection.cc line 37 "NO_RETRY" should be "kNoRetry" to keep 
consistent with the naming conventions for constants.

> libhdfs++ should respect NN retry configuration settings
> 
>
> Key: HDFS-9228
> URL: https://issues.apache.org/jira/browse/HDFS-9228
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9228.HDFS-8707.001.patch, 
> HDFS-9228.HDFS-8707.002.patch, HDFS-9228.HDFS-8707.003.patch, 
> HDFS-9228.HDFS-8707.004.patch, HDFS-9228.HDFS-8707.005.patch, 
> HDFS-9228.HDFS-8707.006.patch
>
>
> Handle the use case of temporary network or NN hiccups and have a 
> configurable number of retries for NN operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports

2015-11-30 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9484:

Description: 
In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
{{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
the block report list by encoding generated {{blocks}} in test.

Meanwhile, {{TinyDatanode#blocks}} is an empty ArrayList with initial capacity. 
In {{TinyDatanode#addBlock()}} first statement, the {{if(nrBlocks == 
blocks.size()) {}} will always be true. We should either fill the blocks with 
dummy report in {{TinyDatanode()}} constructor, or use initial capacity instead 
of {{blocks.size()}} in the above _if_ statement (we should replace 
ArrayList#set with ArrayList#add as well).

There are two potential bugs that make the 

  was:In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
{{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
the block report list by encoding generated {{blocks}} in test.


> NNThroughputBenchmark$BlockReportStats should not send empty block reports
> --
>
> Key: HDFS-9484
> URL: https://issues.apache.org/jira/browse/HDFS-9484
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
> {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
> the block report list by encoding generated {{blocks}} in test.
> Meanwhile, {{TinyDatanode#blocks}} is an empty ArrayList with initial 
> capacity. In {{TinyDatanode#addBlock()}} first statement, the {{if(nrBlocks 
> == blocks.size()) {}} will always be true. We should either fill the blocks 
> with dummy report in {{TinyDatanode()}} constructor, or use initial capacity 
> instead of {{blocks.size()}} in the above _if_ statement (we should replace 
> ArrayList#set with ArrayList#add as well).
> There are two potential bugs that make the 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >