[jira] [Commented] (HDFS-8820) Simplify enabling NameNode RPC congestion control and FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032571#comment-15032571 ] Ming Ma commented on HDFS-8820: --- Thanks [~arpitagarwal]. * The last change made to {{RpcEngine}} is from HDFS-7073 in 2.6. It doesn't seem to cause any issues so far. So maybe Slider is the only implementation outside HDFS/YARN/MR? * Alternatively, what if we define {{RpcEngineV2}} and have {{ProtobufRpcEngine}} implement it? {{RpcEngine}} won't be changed so that it won't break other implementations. Then deprecate the old interface and remove it in trunk. I agree we need to treat compatibility as an important feature for hadoop. But for this specific case, wonder its impact and if we can somehow use the more elegant builder approach. > Simplify enabling NameNode RPC congestion control and FairCallQueue > --- > > Key: HDFS-8820 > URL: https://issues.apache.org/jira/browse/HDFS-8820 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-8820.01.patch, HDFS-8820.02.patch, > HDFS-8820.03.patch > > > Enabling RPC Congestion control and FairCallQueue settings can be simplified > with HDFS-specific configuration keys. Currently the configuration requires > knowing the exact RPC port number and also whether the service RPC port is > enabled or not separately. If a separate service RPC endpoint is not defined > then RPC congestion control must be enabled ([see > comment|https://issues.apache.org/jira/browse/HDFS-8820?focusedCommentId=14987848=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14987848] > from [~mingma] below. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032477#comment-15032477 ] Xiao Chen commented on HDFS-9470: - The test failures looks unrelated, and passed locally. > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7764) DirectoryScanner shouldn't abort the scan if one directory had an error
[ https://issues.apache.org/jira/browse/HDFS-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032490#comment-15032490 ] Colin Patrick McCabe commented on HDFS-7764: Thanks, [~rakeshr]. {code} 856 if (fileNames.size() < 0) { 857 return report; 858 } {code} What's the purpose of this if statement? The size of a list can't be less than 0. {code} 859 files = new File[fileNames.size()]; 860 for (int i = 0; i < fileNames.size(); i++) { 861 files[i] = new File(dir, fileNames.get(i)); 862 } 863 Arrays.sort(files); {code} It would be nice to avoid allocating all these new arrays. We don't really need them. We should be able to sort the list with {{List#sort}}, and we can turn the {{String}} objects into {{File}} objects one at a time in the for loop. > DirectoryScanner shouldn't abort the scan if one directory had an error > --- > > Key: HDFS-7764 > URL: https://issues.apache.org/jira/browse/HDFS-7764 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-7764-01.patch, HDFS-7764.patch > > > If there is an exception while preparing the ScanInfo for the blocks in the > directory, DirectoryScanner is immediately throwing exception and coming out > of the current scan cycle. The idea of this jira is to discuss & improve the > exception handling mechanism. > DirectoryScanner.java > {code} > for (Entryreport : > compilersInProgress.entrySet()) { > try { > dirReports[report.getKey()] = report.getValue().get(); > } catch (Exception ex) { > LOG.error("Error compiling report", ex); > // Propagate ex to DataBlockScanner to deal with > throw new RuntimeException(ex); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8831) Trash Support for deletion in HDFS encryption zone
[ https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032584#comment-15032584 ] Arpit Agarwal commented on HDFS-8831: - Hi [~xyao], thanks for the detailed design note. My comments, mostly around potential compatibility of classes tagged {{@InterfaceAudience.Public}}. # DistributedFileSystem.java:2326: We can skip the call to dfs.getEZForPath if isHDFSEncryptionEnabled is false to avoid extra RPC call when TDE is not enabled. # FileSystem.java:2701: Can we define .Trash as a constant somewhere? # Trash.java:98: Avoid extra RPC for log statement. Can we cache the currentTrashDir some time earlier? # TrashPolicy.java:48: I don't think we should mark it as deprecated. While the TrashPolicyDefault no longer uses the home parameter other implementations may be passing a different value here in theory. # TrashPolicy.java:57: Also we should have a default implementation of this routine else it will be a backward incompatible change (will break existing implementations of this public interface). # TrashPolicy.java:83: Need default implementation. It can just throw UnsupportedOperationException which should be handled by the caller. # TrashPolicy.java:92: Need default implementation. It can just throw UnsupportedOperationException which should be handled by the caller. # TrashPolicy.java:108: We should leave the old method in place to keep the public interface backwards compatible. Perhaps to be conservative we should respect the 'home' parameter if one is passed in instead of using Filesystem#getTrashRoot? https://github.com/arp7/hadoop/commit/7b3212d2c41cc35cce81eadc68c029e0fc67a429 > Trash Support for deletion in HDFS encryption zone > -- > > Key: HDFS-8831 > URL: https://issues.apache.org/jira/browse/HDFS-8831 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, > HDFS-8831.01.patch, HDFS-8831.02.patch > > > Currently, "Soft Delete" is only supported if the whole encryption zone is > deleted. If you delete files whinin the zone with trash feature enabled, you > will get error similar to the following > {code} > rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: > /z1_1/startnn.sh can't be moved from an encryption zone. > {code} > With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of > the file being deleted appropriately to the same encryption zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032506#comment-15032506 ] Colin Patrick McCabe commented on HDFS-8791: Thanks, guys. +1 for this in trunk and branch-2. Putting this in branch-2.6 would be a little unusual since it requires a layout version upgrade, which I thought we had agreed not to do in bugfix releases. But I will leave that decision up to the release manager for the 2.6 branch. Also, I would really like to see a unit test. If necessary we can get this in and then open a JIRA for that, but it should be on our radar. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports
[ https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032516#comment-15032516 ] Colin Patrick McCabe commented on HDFS-9484: Good find, [~liuml07]. > NNThroughputBenchmark$BlockReportStats should not send empty block reports > -- > > Key: HDFS-9484 > URL: https://issues.apache.org/jira/browse/HDFS-9484 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the > {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct > the block report list by encoding generated {{blocks}} in test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors
[ https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032543#comment-15032543 ] Anu Engineer commented on HDFS-9449: test failures are not related to this patch. > DiskBalancer : Add connectors > - > > Key: HDFS-9449 > URL: https://issues.apache.org/jira/browse/HDFS-9449 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9449-HDFS-1312.001.patch, > HDFS-9449-HDFS-1312.002.patch > > > Connectors allow disk balancer data models to connect to an existing cluster > - Namenode or to a json file which describes the cluster. This is used for > discovering the physical layout of the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9478) Reason for failing ipc.FairCallQueue contruction should be thrown
Archana T created HDFS-9478: --- Summary: Reason for failing ipc.FairCallQueue contruction should be thrown Key: HDFS-9478 URL: https://issues.apache.org/jira/browse/HDFS-9478 Project: Hadoop HDFS Issue Type: Bug Reporter: Archana T Assignee: Ajith S Priority: Minor When FairCallQueue Construction fails, NN fails to start throwing RunTimeException without throwing any reason on why it fails. 2015-11-30 17:45:26,661 INFO org.apache.hadoop.ipc.FairCallQueue: FairCallQueue is in use with 4 queues. 2015-11-30 17:45:26,665 DEBUG org.apache.hadoop.metrics2.util.MBeans: Registered Hadoop:service=ipc.65110,name=DecayRpcScheduler 2015-11-30 17:45:26,666 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. java.lang.RuntimeException: org.apache.hadoop.ipc.FairCallQueue could not be constructed. at org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:96) at org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:55) at org.apache.hadoop.ipc.Server.(Server.java:2241) at org.apache.hadoop.ipc.RPC$Server.(RPC.java:942) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534) at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:346) at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:750) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:687) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:889) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:872) Example: reason for above failure could have been -- 1. the weights were not equal to the number of queues configured. 2. decay-scheduler.thresholds not in sync with number of queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9482) Expose reservedForReplicas as a metric
Brahma Reddy Battula created HDFS-9482: -- Summary: Expose reservedForReplicas as a metric Key: HDFS-9482 URL: https://issues.apache.org/jira/browse/HDFS-9482 Project: Hadoop HDFS Issue Type: Improvement Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.
Chris Nauroth created HDFS-9483: --- Summary: Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS. Key: HDFS-9483 URL: https://issues.apache.org/jira/browse/HDFS-9483 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Chris Nauroth If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in a URL to access it. The current documentation does not state this anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9471) Webhdfs not working with shell command when kerberos security+https is enabled.
[ https://issues.apache.org/jira/browse/HDFS-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-9471. - Resolution: Not A Problem [~surendrasingh], that's a good point about the documentation. I filed HDFS-9483 to track a documentation improvement. If you're interested in providing the documentation, please feel free to pick up that one. I'm going to resolve this one. > Webhdfs not working with shell command when kerberos security+https is > enabled. > --- > > Key: HDFS-9471 > URL: https://issues.apache.org/jira/browse/HDFS-9471 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Blocker > Attachments: HDFS-9471.01.patch > > > *Client exception* > {code} > secure@host85:/opt/hdfsdata/HA/install/hadoop/namenode/bin> ./hdfs dfs -ls > webhdfs://x.x.x.x:50070/test > 15/11/25 18:46:55 ERROR web.WebHdfsFileSystem: Unable to get HomeDirectory > from original File System > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:792) > {code} > *Exception in namenode log* > {code} > 2015-11-26 11:03:18,231 WARN org.mortbay.log: EXCEPTION > javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? > at > sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:710) > at sun.security.ssl.InputRecord.read(InputRecord.java:527) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375) > at > org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:708) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > This is because URL schema hard coded in > {{WebHdfsFileSystem.getTransportScheme()}}. > {code} > /** >* return the underlying transport protocol (http / https). >*/ > protected String getTransportScheme() { > return "http"; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available
[ https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-9474: - Attachment: HDFS-9474.001.patch > TestPipelinesFailover would fail if ifconfig is not available > - > > Key: HDFS-9474 > URL: https://issues.apache.org/jira/browse/HDFS-9474 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: John Zhuge > Attachments: HDFS-9474.001.patch > > > HDFS-6693 introduced some debug message to debug why when > TestPipelinesFailover fails. > HDFS-9438 restricted the debug message to Linux/Mac/Solaris. However, the > test would fail when printing debug message if "ifconfig" command is not > available in certain environment. > This is not quite right. The test should not fail due to the debug message > printing. We should catch any exception thrown from the code that prints > debug message, and issue a warning message. > Suggest to make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available
[ https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9474 started by John Zhuge. > TestPipelinesFailover would fail if ifconfig is not available > - > > Key: HDFS-9474 > URL: https://issues.apache.org/jira/browse/HDFS-9474 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: John Zhuge > Attachments: HDFS-9474.001.patch > > > HDFS-6693 introduced some debug message to debug why when > TestPipelinesFailover fails. > HDFS-9438 restricted the debug message to Linux/Mac/Solaris. However, the > test would fail when printing debug message if "ifconfig" command is not > available in certain environment. > This is not quite right. The test should not fail due to the debug message > printing. We should catch any exception thrown from the code that prints > debug message, and issue a warning message. > Suggest to make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore reassigned HDFS-9483: Assignee: Surendra Singh Lilhore > Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured > WebHDFS. > - > > Key: HDFS-9483 > URL: https://issues.apache.org/jira/browse/HDFS-9483 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Chris Nauroth >Assignee: Surendra Singh Lilhore > > If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in > a URL to access it. The current documentation does not state this anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032081#comment-15032081 ] Kihwal Lee commented on HDFS-8791: -- bq. I will test it again if that is the case. Retesting shows {{previous}} containing the valid content. I guess I somehow messed up the testing first time. +1 from me. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9480) Expose nonDfsUsed via StorageTypeStats and DatanodeStatistics
Brahma Reddy Battula created HDFS-9480: -- Summary: Expose nonDfsUsed via StorageTypeStats and DatanodeStatistics Key: HDFS-9480 URL: https://issues.apache.org/jira/browse/HDFS-9480 Project: Hadoop HDFS Issue Type: Improvement Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9481) Expose reservedForReplicas as a metric
Brahma Reddy Battula created HDFS-9481: -- Summary: Expose reservedForReplicas as a metric Key: HDFS-9481 URL: https://issues.apache.org/jira/browse/HDFS-9481 Project: Hadoop HDFS Issue Type: Improvement Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9482) Replace DatanodeInfo constructors with a builder pattern
[ https://issues.apache.org/jira/browse/HDFS-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-9482: --- Summary: Replace DatanodeInfo constructors with a builder pattern (was: Expose reservedForReplicas as a metric) > Replace DatanodeInfo constructors with a builder pattern > > > Key: HDFS-9482 > URL: https://issues.apache.org/jira/browse/HDFS-9482 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9038) Reserved space is erroneously counted towards non-DFS used.
[ https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032110#comment-15032110 ] Brahma Reddy Battula commented on HDFS-9038: Raised separate jira's for above three improvements (HDFS-9480,HDFS-9481 and HDFS-9482).And uploaded the patch to address [~vinayrpet] and [~arpitagarwal] comments,kindly review.. > Reserved space is erroneously counted towards non-DFS used. > --- > > Key: HDFS-9038 > URL: https://issues.apache.org/jira/browse/HDFS-9038 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Brahma Reddy Battula > Attachments: HDFS-9038-002.patch, HDFS-9038-003.patch, > HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038.patch > > > HDFS-5215 changed the DataNode volume available space calculation to consider > the reserved space held by the {{dfs.datanode.du.reserved}} configuration > property. As a side effect, reserved space is now counted towards non-DFS > used. I don't believe it was intentional to change the definition of non-DFS > used. This issue proposes restoring the prior behavior: do not count > reserved space towards non-DFS used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9425) Expose number of blocks per volume as a metric
[ https://issues.apache.org/jira/browse/HDFS-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032116#comment-15032116 ] Brahma Reddy Battula commented on HDFS-9425: can somebody review this patch..?,thanks.. > Expose number of blocks per volume as a metric > -- > > Key: HDFS-9425 > URL: https://issues.apache.org/jira/browse/HDFS-9425 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-9425.patch > > > It will be helpful for user to know the usage in number of blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable
[ https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob reassigned HDFS-9479: - Assignee: Bob > DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network > unstable > > > Key: HDFS-9479 > URL: https://issues.apache.org/jira/browse/HDFS-9479 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Bob >Assignee: Bob >Priority: Blocker > > {code} > Java stack information for the threads listed above: > === > "Thread-1": > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228) > - waiting to lock <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85) > at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at > org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242) > - locked <0xd593aed0> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd593ae88> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd55d9678> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > "LeaseRenewer:hdfs@hacluster:8020": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720) > - waiting to lock <0xd5c1a860> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465) > - locked <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.
[ https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-9417: - Attachment: HDFS-9417.HDFS-8707.000.patch Uploaded first pass at clearing up the warnings. Couldn't run the RAT tool locally, so we'll rely on Jenkins to give us a wash on it. > Clean up the RAT warnings in the HDFS-8707 branch. > -- > > Key: HDFS-9417 > URL: https://issues.apache.org/jira/browse/HDFS-9417 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Xiaobing Zhou > Attachments: HDFS-9417.HDFS-8707.000.patch > > > Recent jenkins builds reveals that the pom.xml in the HDFS-8707 branch does > not currently exclude third-party files. The RAT plugin generates warnings as > these files do not have Apache headers. > The warnings need to be suppressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable
[ https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031952#comment-15031952 ] Brahma Reddy Battula commented on HDFS-9479: Thanks for reporting . Dupe of HDFS-9324..? > DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network > unstable > > > Key: HDFS-9479 > URL: https://issues.apache.org/jira/browse/HDFS-9479 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Bob >Assignee: Bob >Priority: Blocker > > {code} > Java stack information for the threads listed above: > === > "Thread-1": > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228) > - waiting to lock <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85) > at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at > org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242) > - locked <0xd593aed0> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd593ae88> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd55d9678> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > "LeaseRenewer:hdfs@hacluster:8020": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720) > - waiting to lock <0xd5c1a860> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465) > - locked <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable
[ https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-9479. -- Resolution: Duplicate Target Version/s: (was: 2.7.3) > DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network > unstable > > > Key: HDFS-9479 > URL: https://issues.apache.org/jira/browse/HDFS-9479 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Bob >Assignee: Bob >Priority: Blocker > > {code} > Java stack information for the threads listed above: > === > "Thread-1": > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228) > - waiting to lock <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85) > at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at > org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242) > - locked <0xd593aed0> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd593ae88> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd55d9678> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > "LeaseRenewer:hdfs@hacluster:8020": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720) > - waiting to lock <0xd5c1a860> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465) > - locked <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9468) DfsAdmin command set dataXceiver count for datanode
[ https://issues.apache.org/jira/browse/HDFS-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031995#comment-15031995 ] Kihwal Lee commented on HDFS-9468: -- Why don't you make it refreshable? Some configs in datanode are already refreshable. > DfsAdmin command set dataXceiver count for datanode > --- > > Key: HDFS-9468 > URL: https://issues.apache.org/jira/browse/HDFS-9468 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: HDFS-9468.001.patch > > > Now in every datanode,concurrent xceivers count value are all set by > {{DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_DEFAULT}}.And if you want > to set this value for different values because some node has lower memory or > cores.Then you must to change the config and restart datanode.So may be we > can dynamic set dataxceiver count by a dfsadmin command and we can set value > for one or many specific nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9452) libhdfs++ Fix memory stomp in OpenFileForRead.
[ https://issues.apache.org/jira/browse/HDFS-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032026#comment-15032026 ] James Clampffer commented on HDFS-9452: --- Committed to HDFS-8707. Thanks for the pointer to std::tie Bob, I'll check that out and use it where applicable in the future. > libhdfs++ Fix memory stomp in OpenFileForRead. > -- > > Key: HDFS-9452 > URL: https://issues.apache.org/jira/browse/HDFS-9452 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9452.HDFS-8707.000.patch, > HDFS-9452.HDFS-8707.001.patch > > > Running a simple test that opens and closes a file in many threads will fail > under valgrind with an invalid write of size 8. > It looks like the stack is unwinding in the calling thread before the > callback invoked by asio in OpenFileForRead can set the input_stream pointer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable
[ https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob updated HDFS-9479: -- Summary: DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable (was: DeadLock Between DFSOutputStream and LeaseRenewer when Network unstable) > DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network > unstable > > > Key: HDFS-9479 > URL: https://issues.apache.org/jira/browse/HDFS-9479 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Bob >Priority: Blocker > > {code} > Java stack information for the threads listed above: > === > "Thread-1": > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228) > - waiting to lock <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85) > at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at > org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242) > - locked <0xd593aed0> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd593ae88> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd55d9678> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > "LeaseRenewer:hdfs@hacluster:8020": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720) > - waiting to lock <0xd5c1a860> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465) > - locked <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable
[ https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031961#comment-15031961 ] Brahma Reddy Battula commented on HDFS-9479: I mean HDFS-9294. > DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network > unstable > > > Key: HDFS-9479 > URL: https://issues.apache.org/jira/browse/HDFS-9479 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Bob >Assignee: Bob >Priority: Blocker > > {code} > Java stack information for the threads listed above: > === > "Thread-1": > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228) > - waiting to lock <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85) > at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at > org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242) > - locked <0xd593aed0> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd593ae88> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd55d9678> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > "LeaseRenewer:hdfs@hacluster:8020": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720) > - waiting to lock <0xd5c1a860> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465) > - locked <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031954#comment-15031954 ] Kihwal Lee commented on HDFS-8791: -- This is what I saw on the upgraded node before it got finalized. Before upgrade, {{current/finalized}} contained many sub directories. {noformat} -bash-4.1$ ls -l /xxx/data/current/BP-x/previous/finalized total 4 drwxr-xr-x 115 hdfs users 4096 Nov 24 23:01 subdir0 {noformat} This is what I saw in the log. {noformat} 2015-11-24 23:06:09,980 INFO common.Storage: Upgrading block pool storage directory /xxx/data/current/BP-x. old LV = -56; old CTime = 0. new LV = -57; new CTime = 0 2015-11-24 23:06:11,625 INFO common.Storage: HardLinkStats: 116 Directories, including 3 Empty Directories, 57282 single Link operations, 0 multi-Link operations, linking 0 files, total 57282 linkable files. Also physically copied 0 other files. 2015-11-24 23:06:11,671 INFO common.Storage: Upgrade of block pool BP-x at /xxx/data/current/BP-x is complete {noformat} I just noticed the time stamp of {{subdir0}} is old, so empty directories were removed? I will test it again if that is the case. But I thought {{current}} eventually becomes {{previous}} after creating hard links, so even the empty dirs left intact. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9233) Create LICENSE.txt and NOTICES files for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031957#comment-15031957 ] Bob Hansen commented on HDFS-9233: -- [~owen.omalley] - I was looking for examples to follow, and don't see a LICENSE.txt or NOTICES file in any of the other Hadoop sub-projects. There is a LICENSE.txt and NOTICE.txt at the top, but it doesn't appear to be represented anywhere else. Since this is part of the ASF tree now, do you think that is sufficient? > Create LICENSE.txt and NOTICES files for libhdfs++ > -- > > Key: HDFS-9233 > URL: https://issues.apache.org/jira/browse/HDFS-9233 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Bob Hansen > > We use third-party libraries that are Apache and Google licensed, and may be > adding an MIT-licenced third-party library. We need to include the > appropriate license files for inclusion into Apache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9477) namenode starts failed:FSEditLogLoader: Encountered exception on operation TimesOp
[ https://issues.apache.org/jira/browse/HDFS-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031973#comment-15031973 ] Daryn Sharp commented on HDFS-9477: --- Notice that it's occurring because of an atime update - probably from opening the file. This appears to be caused by the file descriptor-ish feature added awhile back. Anyhow, I believe /.reserved paths should probably never occur in the edits. > namenode starts failed:FSEditLogLoader: Encountered exception on operation > TimesOp > -- > > Key: HDFS-9477 > URL: https://issues.apache.org/jira/browse/HDFS-9477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 > Environment: Ubuntu 12.04.1 LTS, java version "1.7.0_79" >Reporter: aplee >Assignee: aplee > > backup name start failed, log below: > 2015-11-28 14:09:13,462 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, > atime=1448692924700, opCode=OP_TIMES, txid=14774180] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > 2015-11-28 14:09:13,572 FATAL > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error > encountered while tailing edits. Shutting down standby NN. > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) > ... 9 more > 2015-11-28 14:09:13,574 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > 2015-11-28 14:09:13,575 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: > I found record in Edits, but I don't know how this record generated > > OP_TIMES > > 14774180 > 0 > /.reserved/.inodes/2346114 > -1 > 1448692924700 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031939#comment-15031939 ] Daryn Sharp commented on HDFS-7435: --- [~liuml07], good catch! Not sure how I accidentally did that and I don't know why the test still passes. I think your proposed fix is correct. I know extremely little of the benchmark tool so I'd be unable to add a test (for the test!) in a timely manner, so if you'd like the credit for the bug then I'm fine with you doing the jira. > PB encoding of block reports is very inefficient > > > Key: HDFS-7435 > URL: https://issues.apache.org/jira/browse/HDFS-7435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, > HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, > HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, > HDFS-7435.patch, HDFS-7435.patch > > > Block reports are encoded as a PB repeating long. Repeating fields use an > {{ArrayList}} with default capacity of 10. A block report containing tens or > hundreds of thousand of longs (3 for each replica) is extremely expensive > since the {{ArrayList}} must realloc many times. Also, decoding repeating > fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9479) DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network unstable
[ https://issues.apache.org/jira/browse/HDFS-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031966#comment-15031966 ] Kihwal Lee commented on HDFS-9479: -- Yes It looked familiar. I think it is a dupe. > DeadLock Happened Between DFSOutputStream and LeaseRenewer when Network > unstable > > > Key: HDFS-9479 > URL: https://issues.apache.org/jira/browse/HDFS-9479 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.1 >Reporter: Bob >Assignee: Bob >Priority: Blocker > > {code} > Java stack information for the threads listed above: > === > "Thread-1": > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:228) > - waiting to lock <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:85) > at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:480) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:491) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:803) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:765) > - locked <0xd5c1a860> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at > org.apache.hadoop.mapreduce.jobhistory.EventWriter.close(EventWriter.java:80) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.closeWriter(JobHistoryEventHandler.java:1242) > - locked <0xd593aed0> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:406) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd593ae88> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1677) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > - locked <0xd55d9678> (a java.lang.Object) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1176) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1524) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > "LeaseRenewer:hdfs@hacluster:8020": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:720) > - waiting to lock <0xd5c1a860> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:598) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:465) > - locked <0xd5c3c868> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:75) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:311) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9038) Reserved space is erroneously counted towards non-DFS used.
[ https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032253#comment-15032253 ] Hadoop QA commented on HDFS-9038: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 8 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 20s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 53s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 53s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 44s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 48s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 48s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m 16s {color} | {color:red} hadoop-hdfs-project-jdk1.8.0_66 with JDK v1.8.0_66 generated 3 new issues (was 49, now 49). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 17m 54s {color} | {color:red} hadoop-hdfs-project-jdk1.7.0_85 with JDK v1.7.0_85 generated 3 new issues (was 51, now 51). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 36s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 35s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-hdfs-project (total was 293, now 295). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 12s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 49s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 115m 25s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 48s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 121m 39s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} |
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032285#comment-15032285 ] Mingliang Liu commented on HDFS-7435: - Thanks for your comment, [~daryn] and [~shv]. I started to work on {{NNThroughputBenchmark}} just recently and knew nothing about it before that. If the empty block report list in this patch was not intensional, I think I don't need more context of this patch. Sure I'd like to make the change as we discussed above. The jira is [HDFS-9484]. Let's continue further discussion on the fix there. The reason why the unit tests could pass may be that the {{TestNNThroughputBenchmark}} is rather a driver to run the benchmark with default parameters than a real unit test that asserts expected behavior for different scenarios. If we need a sophisticated unit test, perhaps we can address it separately. > PB encoding of block reports is very inefficient > > > Key: HDFS-7435 > URL: https://issues.apache.org/jira/browse/HDFS-7435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, > HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, > HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, > HDFS-7435.patch, HDFS-7435.patch > > > Block reports are encoded as a PB repeating long. Repeating fields use an > {{ArrayList}} with default capacity of 10. A block report containing tens or > hundreds of thousand of longs (3 for each replica) is extremely expensive > since the {{ArrayList}} must realloc many times. Also, decoding repeating > fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9273) ACLs on root directory may be lost after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032262#comment-15032262 ] Sangjin Lee commented on HDFS-9273: --- +1 SGTM > ACLs on root directory may be lost after NN restart > --- > > Key: HDFS-9273 > URL: https://issues.apache.org/jira/browse/HDFS-9273 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Fix For: 2.8.0 > > Attachments: HDFS-9273.001.patch, HDFS-9273.002.patch > > > After restarting namenode, the ACLs on the root directory ("/") may be lost > if it's rolled over to fsimage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032264#comment-15032264 ] Sangjin Lee commented on HDFS-9470: --- +1 SGTM > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports
Mingliang Liu created HDFS-9484: --- Summary: NNThroughputBenchmark$BlockReportStats should not send empty block reports Key: HDFS-9484 URL: https://issues.apache.org/jira/browse/HDFS-9484 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Mingliang Liu Assignee: Mingliang Liu In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should actually construct the block report list by encoding generated {{blocks}} in test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports
[ https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9484: Description: In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct the block report list by encoding generated {{blocks}} in test. (was: In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should actually construct the block report list by encoding generated {{blocks}} in test.) > NNThroughputBenchmark$BlockReportStats should not send empty block reports > -- > > Key: HDFS-9484 > URL: https://issues.apache.org/jira/browse/HDFS-9484 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the > {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct > the block report list by encoding generated {{blocks}} in test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8871) Decommissioning of a node with a failed volume may not start
[ https://issues.apache.org/jira/browse/HDFS-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-8871: - Target Version/s: 2.7.3, 2.6.4 (was: 2.7.3, 2.6.3) > Decommissioning of a node with a failed volume may not start > > > Key: HDFS-8871 > URL: https://issues.apache.org/jira/browse/HDFS-8871 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kihwal Lee >Assignee: Daryn Sharp >Priority: Critical > > Since staleness may not be properly cleared, a node with a failed volume may > not actually get scanned for block replication. Nothing is being replicated > from these nodes. > This bug does not manifest unless the datanode has a unique storage ID per > volume. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8871) Decommissioning of a node with a failed volume may not start
[ https://issues.apache.org/jira/browse/HDFS-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032172#comment-15032172 ] Junping Du commented on HDFS-8871: -- Moving this to 2.6.4 as no update for a while. > Decommissioning of a node with a failed volume may not start > > > Key: HDFS-8871 > URL: https://issues.apache.org/jira/browse/HDFS-8871 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kihwal Lee >Assignee: Daryn Sharp >Priority: Critical > > Since staleness may not be properly cleared, a node with a failed volume may > not actually get scanned for block replication. Nothing is being replicated > from these nodes. > This bug does not manifest unless the datanode has a unique storage ID per > volume. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032178#comment-15032178 ] Haohui Mai commented on HDFS-8791: -- Marking it as a critical bug of 2.6.3. I think it's important to cherry-pick this patch to the 2.6 line to avoid serious performance degradation. [~djp] what do you think? > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8791: - Target Version/s: 2.6.3 > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032190#comment-15032190 ] Kihwal Lee commented on HDFS-8791: -- bq. Marking it as a critical bug of 2.6.3. If you want to pull this in 2.6.3, it might make sense to push it for 2.7.2. If 2.6.3 comes out earlier than 2.7.3, we will be creating a version of 2.6 that cannot be upgraded to the latest 2.7. [~vinodkv], I think 2.6 and 2.7 release managers should coordinate. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available
[ https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032200#comment-15032200 ] Yongjun Zhang commented on HDFS-9474: - Hi John, Thanks for the patch. Two minor: # Suggest to switch from {{System.println}} to {{LOG.info}}, {{LOG.warn}}, etc. # For the stack trace printing, suggest to do {{LOG.warn("Error when running " + scmd, e)}} Thanks. > TestPipelinesFailover would fail if ifconfig is not available > - > > Key: HDFS-9474 > URL: https://issues.apache.org/jira/browse/HDFS-9474 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: John Zhuge > Attachments: HDFS-9474.001.patch > > > HDFS-6693 introduced some debug message to debug why when > TestPipelinesFailover fails. > HDFS-9438 restricted the debug message to Linux/Mac/Solaris. However, the > test would fail when printing debug message if "ifconfig" command is not > available in certain environment. > This is not quite right. The test should not fail due to the debug message > printing. We should catch any exception thrown from the code that prints > debug message, and issue a warning message. > Suggest to make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032170#comment-15032170 ] Haohui Mai commented on HDFS-9470: -- Looks like it is an important fix but it has relatively low risks. I think it is beneficial to put it into 2.6.3. The patch looks good to me overall. Kicking off another round of Jenkins run. > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032181#comment-15032181 ] Xiao Chen commented on HDFS-9470: - Thanks [~wheat9] for the comment. I'll watch out for the Jenkins result. > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032187#comment-15032187 ] Junping Du commented on HDFS-8791: -- bq. I think it's important to cherry-pick this patch to the 2.6 line to avoid serious performance degradation. Junping Du what do you think? +1. In case we don't have any compatible issue. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available
[ https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-9474: - Attachment: HDFS-9474.002.patch > TestPipelinesFailover would fail if ifconfig is not available > - > > Key: HDFS-9474 > URL: https://issues.apache.org/jira/browse/HDFS-9474 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: John Zhuge > Attachments: HDFS-9474.001.patch, HDFS-9474.002.patch > > > HDFS-6693 introduced some debug message to debug why when > TestPipelinesFailover fails. > HDFS-9438 restricted the debug message to Linux/Mac/Solaris. However, the > test would fail when printing debug message if "ifconfig" command is not > available in certain environment. > This is not quite right. The test should not fail due to the debug message > printing. We should catch any exception thrown from the code that prints > debug message, and issue a warning message. > Suggest to make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9469) DiskBalancer : Add Planner
[ https://issues.apache.org/jira/browse/HDFS-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-9469: --- Attachment: HDFS-9469-HDFS-1312.001.patch Attaching patch for code review. I will submit the patch after HDFS-9449 is submitted. > DiskBalancer : Add Planner > --- > > Key: HDFS-9469 > URL: https://issues.apache.org/jira/browse/HDFS-9469 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9469-HDFS-1312.001.patch > > > Disk Balancer reads the cluster data and then creates a plan for the data > moves based on the snap-shot of the data read from the nodes. This plan is > later submitted to data nodes for execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6363) Improve concurrency while checking inclusion and exclusion of datanodes
[ https://issues.apache.org/jira/browse/HDFS-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6363: --- Attachment: HDFS-6363-003.patch Attaching the patch after fixing checkstyle issues. > Improve concurrency while checking inclusion and exclusion of datanodes > --- > > Key: HDFS-6363 > URL: https://issues.apache.org/jira/browse/HDFS-6363 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Benoy Antony >Assignee: Benoy Antony > Labels: BB2015-05-TBR > Attachments: HDFS-6363-002.patch, HDFS-6363-003.patch, HDFS-6363.patch > > > HostFileManager holds two effectively immutable objects - includes and > excludes. These two objects can be safely published together using a volatile > container instead of synchronizing for all mutators and accessors. > This improves the concurrency while using HostFileManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException
[ https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032355#comment-15032355 ] Colin Patrick McCabe commented on HDFS-9429: This looks good. Just one comment, though: can we decrease the 100 ms polling timeout in {{MiniJournalCluster#waitActive}} to 50 ms? > Tests in TestDFSAdminWithHA intermittently fail with EOFException > - > > Key: HDFS-9429 > URL: https://issues.apache.org/jira/browse/HDFS-9429 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9429.001.patch, HDFS-9429.002.patch, > HDFS-9429.reproduce > > > I have seen this fail a handful of times for {{testMetaSave}}, but from my > understanding this is from {{setUpHaCluster}} so theoretically it could fail > for any cases in the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-9269) Need to update the documentation and wrapper for fuse-dfs
[ https://issues.apache.org/jira/browse/HDFS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9269 started by Wei-Chiu Chuang. - > Need to update the documentation and wrapper for fuse-dfs > - > > Key: HDFS-9269 > URL: https://issues.apache.org/jira/browse/HDFS-9269 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-9269.001.patch, HDFS-9269.002.patch > > > To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the > wrapper script of fuse-dfs, but found them super outdated. (the wrapper was > last updated four years ago, and the hadoop project layout has dramatically > changed since then). I am creating this JIRA to track the status of the > update. > There are quite a few external blogs/discussion threads floating around the > internet which talked about how to update the scripts, but no one took the > time to update them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9269) Need to update the documentation and wrapper for fuse-dfs
[ https://issues.apache.org/jira/browse/HDFS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9269: -- Attachment: HDFS-9269.002.patch Rev02: Made the wrapper more self contained. Still need more testing (i.e. install it on a pristine machine) to make sure it works out of box. > Need to update the documentation and wrapper for fuse-dfs > - > > Key: HDFS-9269 > URL: https://issues.apache.org/jira/browse/HDFS-9269 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-9269.001.patch, HDFS-9269.002.patch > > > To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the > wrapper script of fuse-dfs, but found them super outdated. (the wrapper was > last updated four years ago, and the hadoop project layout has dramatically > changed since then). I am creating this JIRA to track the status of the > update. > There are quite a few external blogs/discussion threads floating around the > internet which talked about how to update the scripts, but no one took the > time to update them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available
[ https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032383#comment-15032383 ] Yongjun Zhang commented on HDFS-9474: - Thanks for the new rev John. +1 on rev 002 pending jenkins. > TestPipelinesFailover would fail if ifconfig is not available > - > > Key: HDFS-9474 > URL: https://issues.apache.org/jira/browse/HDFS-9474 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: John Zhuge > Attachments: HDFS-9474.001.patch, HDFS-9474.002.patch > > > HDFS-6693 introduced some debug message to debug why when > TestPipelinesFailover fails. > HDFS-9438 restricted the debug message to Linux/Mac/Solaris. However, the > test would fail when printing debug message if "ifconfig" command is not > available in certain environment. > This is not quite right. The test should not fail due to the debug message > printing. We should catch any exception thrown from the code that prints > debug message, and issue a warning message. > Suggest to make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException
[ https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032411#comment-15032411 ] Xiao Chen commented on HDFS-9429: - Thanks Colin for the comment! I'd love to make improvements but could you explain your concern here? Is this to make {{waitActive}} to finish sooner and reduce the overall wait time? > Tests in TestDFSAdminWithHA intermittently fail with EOFException > - > > Key: HDFS-9429 > URL: https://issues.apache.org/jira/browse/HDFS-9429 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9429.001.patch, HDFS-9429.002.patch, > HDFS-9429.reproduce > > > I have seen this fail a handful of times for {{testMetaSave}}, but from my > understanding this is from {{setUpHaCluster}} so theoretically it could fail > for any cases in the class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032451#comment-15032451 ] Hadoop QA commented on HDFS-9470: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 49s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 46s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 22s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 180m 24s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 | | JDK v1.7.0_85 Failed junit tests | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure000 | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Updated] (HDFS-9371) Code cleanup for DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9371: Attachment: HDFS-9371.002.patch > Code cleanup for DatanodeManager > > > Key: HDFS-9371 > URL: https://issues.apache.org/jira/browse/HDFS-9371 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-9371.000.patch, HDFS-9371.001.patch, > HDFS-9371.002.patch > > > Some code cleanup for DatanodeManager. The main changes include: > # make the synchronization of {{datanodeMap}} and > {{datanodesSoftwareVersions}} consistent > # remove unnecessary lock in {{handleHeartbeat}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.
[ https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-9267: Attachment: HDFS-9267.04.patch It fixes the test failures for JDK 7. Upload the patch to trigger a new jenkins. > TestDiskError should get stored replicas through FsDatasetTestUtils. > > > Key: HDFS-9267 > URL: https://issues.apache.org/jira/browse/HDFS-9267 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, > HDFS-9267.02.patch, HDFS-9267.03.patch, HDFS-9267.04.patch > > > {{TestDiskError#testReplicationError}} scans local directories to verify > blocks and metadata files, which leaks the details of {{FsDataset}} > implementation. > This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications
[ https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032977#comment-15032977 ] Jing Zhao commented on HDFS-9381: - bq. I'm not fully following the above. Jing do you mind elaborating it a little bit? Sorry for the confusion. As commented by Walter, "if DN_2 fails soon after DN_1, only neededReplications updated", i.e., the records in neededReplications will have enough time to be updated so that they can indicate the block groups are missing 2 internal blocks, before the reported issue happens. > When same block came for replication for Striped mode, we can move that block > to PendingReplications > > > Key: HDFS-9381 > URL: https://issues.apache.org/jira/browse/HDFS-9381 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, > HDFS-9381.00.patch, HDFS-9381.01.patch > > > Currently I noticed that we are just returning null if block already exists > in pendingReplications in replication flow for striped blocks. > {code} > if (block.isStriped()) { > if (pendingNum > 0) { > // Wait the previous recovery to finish. > return null; > } > {code} > Here if we just return null and if neededReplications contains only fewer > blocks(basically by default if less than numliveNodes*2), then same blocks > can be picked again from neededReplications from next loop as we are not > removing element from neededReplications. Since this replication process need > to take fsnamesystmem lock and do, we may spend some time unnecessarily in > every loop. > So my suggestion/improvement is: > Instead of just returning null, how about incrementing pendingReplications > for this block and remove from neededReplications? and also another point to > consider here is, to add into pendingReplications, generally we need target > and it is nothing but to which node we issued replication command. Later when > after replication success and DN reported it, block will be removed from > pendingReplications from NN addBlock. > So since this is newly picked block from neededReplications, we would not > have selected target yet. So which target to be passed to pendingReplications > if we add this block? One Option I am thinking is, how about just passing > srcNode itself as target for this special condition? So, anyway if the block > is really missed, srcNode will not report it. So this block will not be > removed from pending replications, so that when it is timed out, it will be > considered for replication again and that time it will find actual target to > replicate while processing as part of regular replication flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033014#comment-15033014 ] Hudson commented on HDFS-9470: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #652 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/652/]) HDFS-9470. Encryption zone on root not loaded from fsimage after NN (wang: rev 9b8e50b424d060e16c1175b1811e7abc476e2468) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Fix For: 2.7.2, 2.6.3 > > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9477) namenode starts failed:FSEditLogLoader: Encountered exception on operation TimesOp
[ https://issues.apache.org/jira/browse/HDFS-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] aplee updated HDFS-9477: Description: backup namenode start failed, log below: 2015-11-28 14:09:13,462 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, atime=1448692924700, opCode=OP_TIMES, txid=14774180] java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) 2015-11-28 14:09:13,572 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN. java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) ... 9 more 2015-11-28 14:09:13,574 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2015-11-28 14:09:13,575 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: I found record in Edits, but I don't know how this record generated OP_TIMES 14774180 0 /.reserved/.inodes/2346114 -1 1448692924700 was: backup name start failed, log below: 2015-11-28 14:09:13,462 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, atime=1448692924700, opCode=OP_TIMES, txid=14774180] java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832) at
[jira] [Commented] (HDFS-6363) Improve concurrency while checking inclusion and exclusion of datanodes
[ https://issues.apache.org/jira/browse/HDFS-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032603#comment-15032603 ] Hadoop QA commented on HDFS-6363: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs (total was 5, now 7). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 52s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 25s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 142m 13s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030 | | | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | | JDK v1.7.0_85 Failed junit tests | hadoop.hdfs.TestReadStripedFileWithDecoding | | | hadoop.hdfs.TestDFSUpgradeFromImage | | |
[jira] [Updated] (HDFS-6363) Improve concurrency while checking inclusion and exclusion of datanodes
[ https://issues.apache.org/jira/browse/HDFS-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6363: --- Attachment: HDFS-6363-004.patch > Improve concurrency while checking inclusion and exclusion of datanodes > --- > > Key: HDFS-6363 > URL: https://issues.apache.org/jira/browse/HDFS-6363 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Benoy Antony >Assignee: Benoy Antony > Labels: BB2015-05-TBR > Attachments: HDFS-6363-002.patch, HDFS-6363-003.patch, > HDFS-6363-004.patch, HDFS-6363.patch > > > HostFileManager holds two effectively immutable objects - includes and > excludes. These two objects can be safely published together using a volatile > container instead of synchronizing for all mutators and accessors. > This improves the concurrency while using HostFileManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032643#comment-15032643 ] Jing Zhao commented on HDFS-9129: - The latest patch looks pretty good to me. The only minor comment is the following TODO: {code} // TODO delete the following line? startSecretManagerIfNecessary(); {code} I think we can remove this line since it is already called in {{BlockManagerSafeMode#leaveSafeMode}}. Any other comments, [~daryn] and [~wheat9]? > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, > HDFS-9129.023.patch, HDFS-9129.024.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032689#comment-15032689 ] Allen Wittenauer commented on HDFS-7984: If I understand the code change correctly, I'm sort of surprised this doesn't work: on user account1: {code} hdfs fetchdt /tmp/token chmod a+r /tmp/token {code} on user account2: {code} hadoop fs -Dhadoop.token.file=/tmp/token -ls /user/account1 {code} Both hdfs and webhdfs are failing this simple test. > webhdfs:// needs to support provided delegation tokens > -- > > Key: HDFS-7984 > URL: https://issues.apache.org/jira/browse/HDFS-7984 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: HeeSoo Kim >Priority: Blocker > Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, > HDFS-7984.003.patch, HDFS-7984.004.patch, HDFS-7984.005.patch, > HDFS-7984.006.patch, HDFS-7984.007.patch, HDFS-7984.patch > > > When using the webhdfs:// filesystem (especially from distcp), we need the > ability to inject a delegation token rather than webhdfs initialize its own. > This would allow for cross-authentication-zone file system accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6533) intermittent org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest failure
[ https://issues.apache.org/jira/browse/HDFS-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-6533: -- Status: Patch Available (was: Open) > intermittent > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest > failure > -- > > Key: HDFS-6533 > URL: https://issues.apache.org/jira/browse/HDFS-6533 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-6533.001.patch, HDFS-6533.002.patch > > > Per https://builds.apache.org/job/Hadoop-Hdfs-trunk/1774/testReport, the > following test failed. However, local rerun is successful. > {code} > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality > Error Message > Wanted but not invoked: > datanodeProtocolClientSideTranslatorPB.registerDatanode( > > ); > -> at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175) > Actually, there were zero interactions with this mock. > Stacktrace > org.mockito.exceptions.verification.WantedButNotInvoked: > Wanted but not invoked: > datanodeProtocolClientSideTranslatorPB.registerDatanode( > > ); > -> at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175) > Actually, there were zero interactions with this mock. > at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175) > Standard Output > 2014-06-14 12:42:08,723 INFO datanode.DataNode > (SimulatedFSDataset.java:registerMBean(968)) - Registered FSDatasetState MBean > 2014-06-14 12:42:08,730 INFO datanode.DataNode > (BPServiceActor.java:run(805)) - Block pool (Datanode Uuid > unassigned) service to 0.0.0.0/0.0.0.0:0 starting to offer service > 2014-06-14 12:42:08,730 DEBUG datanode.DataNode > (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool > (Datanode Uuid unassigned) service to 0.0.0.0/0.0.0.0:0 received > versionRequest response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid > 2014-06-14 12:42:08,731 INFO datanode.DataNode > (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid > null) service to 0.0.0.0/0.0.0.0:0 beginning handshake with NN > 2014-06-14 12:42:08,731 INFO datanode.DataNode > (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid > (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0 successfully registered > with NN > 2014-06-14 12:42:08,732 INFO datanode.DataNode > (BPServiceActor.java:offerService(637)) - For namenode 0.0.0.0/0.0.0.0:0 > using DELETEREPORT_INTERVAL of 30 msec BLOCKREPORT_INTERVAL of > 2160msec CACHEREPORT_INTERVAL of 1msec Initial delay: 0msec; > heartBeatInterval=3000 > 2014-06-14 12:42:08,732 DEBUG datanode.DataNode > (BPServiceActor.java:sendHeartBeat(562)) - Sending heartbeat with 1 storage > reports from service actor: Block pool fake bpid (Datanode Uuid null) service > to 0.0.0.0/0.0.0.0:0 > 2014-06-14 12:42:08,734 INFO datanode.DataNode > (BPServiceActor.java:blockReport(498)) - Sent 1 blockreports 0 blocks total. > Took 1 msec to generate and 0 msecs for RPC and NN processing. Got back > commands none > 2014-06-14 12:42:08,738 INFO datanode.DataNode > (BPServiceActor.java:run(805)) - Block pool fake bpid (Datanode Uuid null) > service to 0.0.0.0/0.0.0.0:1 starting to offer service > 2014-06-14 12:42:08,739 DEBUG datanode.DataNode > (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool fake bpid > (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 received versionRequest > response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid > 2014-06-14 12:42:08,739 INFO datanode.DataNode > (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid > null) service to 0.0.0.0/0.0.0.0:1 beginning handshake with NN > 2014-06-14 12:42:08,740 INFO datanode.DataNode > (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid > (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 successfully registered > with NN > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9487) libhdfs++ Enable builds with no compiler optimizations
James Clampffer created HDFS-9487: - Summary: libhdfs++ Enable builds with no compiler optimizations Key: HDFS-9487 URL: https://issues.apache.org/jira/browse/HDFS-9487 Project: Hadoop HDFS Issue Type: Sub-task Reporter: James Clampffer Assignee: James Clampffer The default build configuration uses -02 -g . To make debugging easier it would be really nice to be able to produce builds with -O0. I haven't found an existing flag to pass to maven or cmake to accomplish this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032740#comment-15032740 ] Chris Trezzo commented on HDFS-8791: I am finishing up the unit test and will post it later today. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Critical > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications
[ https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032807#comment-15032807 ] Zhe Zhang commented on HDFS-9381: - Thanks Jing for the comment. Let's consider this case: # Cluster has 100 nodes # DN_1 and DN_2 failed # They are on different racks # They happen to share 1000 striped blocks and 1000 contiguous blocks (we can easily scale up the calculated numbers for n x 1000 blocks). So there are 2000 striped internal blocks, and 2000 contiguous block replicas missing. So in each iteration ReplicationMonitor tries to pickup 200 items. Without the change, it will be 100 striped and 100 contiguous on average. Assuming EC recovery work takes longer than 3 seconds ({{replicationRecheckInterval}}), then the 2nd iteration will pickup about 5 invalid striped items (being recovered). If EC recovery work takes long enough, then the 3rd round will pickup about 10 (2/18) invalid striped items, 4th round 18 invalid items. This way the replication work for the lost contiguous replicas will take 20 x 3 = 30 seconds to distributed to DNs. With the change, 2nd round will pickup 95 striped items and 105 contiguous items, 3rd round 110 contiguous items, It's tricky to get very accurate, but seems we can save a few 3-second cycles. [~umamaheswararao] Does the example itself make sense to you? If so, how should we calculate the saving in locking time? bq. it is also possible that because of the longer processing time, there is higher chance for the striped blocks to be updated in the UC queue before being processed by the replication monitor for the first time I'm not fully following the above. Jing do you mind elaborating it a little bit? > When same block came for replication for Striped mode, we can move that block > to PendingReplications > > > Key: HDFS-9381 > URL: https://issues.apache.org/jira/browse/HDFS-9381 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, > HDFS-9381.00.patch, HDFS-9381.01.patch > > > Currently I noticed that we are just returning null if block already exists > in pendingReplications in replication flow for striped blocks. > {code} > if (block.isStriped()) { > if (pendingNum > 0) { > // Wait the previous recovery to finish. > return null; > } > {code} > Here if we just return null and if neededReplications contains only fewer > blocks(basically by default if less than numliveNodes*2), then same blocks > can be picked again from neededReplications from next loop as we are not > removing element from neededReplications. Since this replication process need > to take fsnamesystmem lock and do, we may spend some time unnecessarily in > every loop. > So my suggestion/improvement is: > Instead of just returning null, how about incrementing pendingReplications > for this block and remove from neededReplications? and also another point to > consider here is, to add into pendingReplications, generally we need target > and it is nothing but to which node we issued replication command. Later when > after replication success and DN reported it, block will be removed from > pendingReplications from NN addBlock. > So since this is newly picked block from neededReplications, we would not > have selected target yet. So which target to be passed to pendingReplications > if we add this block? One Option I am thinking is, how about just passing > srcNode itself as target for this special condition? So, anyway if the block > is really missed, srcNode will not report it. So this block will not be > removed from pending replications, so that when it is timed out, it will be > considered for replication again and that time it will find actual target to > replicate while processing as part of regular replication flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots
[ https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032805#comment-15032805 ] Chris Nauroth commented on HDFS-8986: - I don't think we can change the default behavior of the commands, at least not within 2.x, on grounds of backwards-compatibility. It's possible that users already depend on inclusion of snapshot contents in the results of these commands. Adding a new option for filtering out snapshot contents would be a backwards-compatible change though. > Add option to -du to calculate directory space usage excluding snapshots > > > Key: HDFS-8986 > URL: https://issues.apache.org/jira/browse/HDFS-8986 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Gautam Gopalakrishnan >Assignee: Jagadesh Kiran N > > When running {{hadoop fs -du}} on a snapshotted directory (or one of its > children), the report includes space consumed by blocks that are only present > in the snapshots. This is confusing for end users. > {noformat} > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -createSnapshot /tmp/parent snap1 > Created snapshot /tmp/parent/.snapshot/snap1 > $ hadoop fs -rm -skipTrash /tmp/parent/sub1/* > ... > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -deleteSnapshot /tmp/parent snap1 > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 0 0 /tmp/parent > 0 0 /tmp/parent/sub1 > {noformat} > It would be helpful if we had a flag, say -X, to exclude any snapshot related > disk usage in the output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9442) Move block replication logic from BlockManager to a new class ReplicationManager
[ https://issues.apache.org/jira/browse/HDFS-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9442: Attachment: HDFS-9442.004.patch Per offline discussion with [~wheat9] and [~jingzhao], the v4 patch makes the {{ReplicationManager#removeBlockFromExcessReplicateMap()}} and {{ReplicationManager#isBlockExcessOnNode}} accept {{BlockInfo}} object instead of {{Block}}. This patch cherry picked [HDFS-9485]. > Move block replication logic from BlockManager to a new class > ReplicationManager > > > Key: HDFS-9442 > URL: https://issues.apache.org/jira/browse/HDFS-9442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9442.000.patch, HDFS-9442.001.patch, > HDFS-9442.002.patch, HDFS-9442.003.patch, HDFS-9442.004.patch > > > Currently the {{BlockManager}} is managing all replication logic for over- , > under- and mis-replicated blocks. This jira proposes to move that code to a > new class named {{ReplicationManager}} for cleaner code logic, shorter source > files, and easier lock separating work in future. > The {{ReplicationManager}} is a package local class, providing > {{BlockManager}} with methods that accesses its internal data structures of > replication queue. Meanwhile, the class maintains the lifecycle of > {{replicationThread}} and {{replicationQueuesInitializer}} daemon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032900#comment-15032900 ] Akira AJISAKA commented on HDFS-9470: - Hi [~xiaochen], do we need to commit this patch to branch-2.8 as well? > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Fix For: 2.7.2, 2.6.3 > > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032924#comment-15032924 ] Andrew Wang commented on HDFS-9470: --- Yea my bad on forgetting branch-2.8, I just committed it there too. Thanks [~ajisakaa] for the catch! > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Fix For: 2.7.2, 2.6.3 > > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications
[ https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032965#comment-15032965 ] Walter Su commented on HDFS-9381: - 1. need to lock on {{readyForReplications}}. {code} + if (unscheduledPendingReplications.remove(block)) { +readyForReplications.add(block); + } {code} 2. Here We can have some log. {code} if (pendingNum > 0) { // Wait the previous recovery to finish. +pendingReplications.addToUnscheduledPendingReplication(block); +neededReplications.remove(block, priority); return null; {code} 3. Here need to check if block is already in {{readyForReplications}}. Otherwise it's possible the block appears in {{readyForReplications}}, and re-processed. {code} addToUnscheduledPendingReplication(BlockInfo block) { {code} Speaking of the case [~zhz] mentioned above. Assume DN_1 has 1m blocks totally. So it takes 5000 iter to process all, which means about 4 hrs. If DN_2 fails soon after DN_1, only {{neededReplications}} updated. If DN_2 fails long after DN_1, the previous task already finished so we schedule a new task. > When same block came for replication for Striped mode, we can move that block > to PendingReplications > > > Key: HDFS-9381 > URL: https://issues.apache.org/jira/browse/HDFS-9381 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, > HDFS-9381.00.patch, HDFS-9381.01.patch > > > Currently I noticed that we are just returning null if block already exists > in pendingReplications in replication flow for striped blocks. > {code} > if (block.isStriped()) { > if (pendingNum > 0) { > // Wait the previous recovery to finish. > return null; > } > {code} > Here if we just return null and if neededReplications contains only fewer > blocks(basically by default if less than numliveNodes*2), then same blocks > can be picked again from neededReplications from next loop as we are not > removing element from neededReplications. Since this replication process need > to take fsnamesystmem lock and do, we may spend some time unnecessarily in > every loop. > So my suggestion/improvement is: > Instead of just returning null, how about incrementing pendingReplications > for this block and remove from neededReplications? and also another point to > consider here is, to add into pendingReplications, generally we need target > and it is nothing but to which node we issued replication command. Later when > after replication success and DN reported it, block will be removed from > pendingReplications from NN addBlock. > So since this is newly picked block from neededReplications, we would not > have selected target yet. So which target to be passed to pendingReplications > if we add this block? One Option I am thinking is, how about just passing > srcNode itself as target for this special condition? So, anyway if the block > is really missed, srcNode will not report it. So this block will not be > removed from pending replications, so that when it is timed out, it will be > considered for replication again and that time it will find actual target to > replicate while processing as part of regular replication flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9477) namenode starts failed:FSEditLogLoader: Encountered exception on operation TimesOp
[ https://issues.apache.org/jira/browse/HDFS-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033010#comment-15033010 ] aplee commented on HDFS-9477: - Thanks for your reply. Which file do you mean by opening the file? /.reserved/.inodes/2346114? I mounted HDFS on linux using nfs gateway, and then shared it with samba. So we can open the files in HDFS without downloading it on windows. About thirty records of OP_TIMES of /.reserved/.inodes/ occurs in edits in about a week after that.I think that may be related. I know little about the file descriptor-ish feature, and will learn something about it > namenode starts failed:FSEditLogLoader: Encountered exception on operation > TimesOp > -- > > Key: HDFS-9477 > URL: https://issues.apache.org/jira/browse/HDFS-9477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 > Environment: Ubuntu 12.04.1 LTS, java version "1.7.0_79" >Reporter: aplee >Assignee: aplee > > backup name start failed, log below: > 2015-11-28 14:09:13,462 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation TimesOp [length=0, path=/.reserved/.inodes/2346114, mtime=-1, > atime=1448692924700, opCode=OP_TIMES, txid=14774180] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > 2015-11-28 14:09:13,572 FATAL > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error > encountered while tailing edits. Shutting down standby NN. > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:244) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:832) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:813) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) > ... 9 more > 2015-11-28 14:09:13,574 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > 2015-11-28 14:09:13,575 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: > I found record in Edits, but I don't know how this record generated >
[jira] [Commented] (HDFS-9336) deleteSnapshot throws NPE when snapshotname is null
[ https://issues.apache.org/jira/browse/HDFS-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033009#comment-15033009 ] Akira AJISAKA commented on HDFS-9336: - +1. I ran all the failed test on JDK7 and they passed locally. > deleteSnapshot throws NPE when snapshotname is null > --- > > Key: HDFS-9336 > URL: https://issues.apache.org/jira/browse/HDFS-9336 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-9336-002.patch, HDFS-9336-003.patch, > HDFS-9336-004.patch, HDFS-9336.patch > > > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$DeleteSnapshotRequestProto$Builder.setSnapshotName(ClientNamenodeProtocolProtos.java:17509) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.deleteSnapshot(ClientNamenodeProtocolTranslatorPB.java:1005) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy15.deleteSnapshot(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.deleteSnapshot(DFSClient.java:2106) > at > org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1660) > at > org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.deleteSnapshot(DistributedFileSystem.java:1677) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testWebHdfsAllowandDisallowSnapshots(TestWebHDFS.java:380) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) > at > org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9038) Reserved space is erroneously counted towards non-DFS used.
[ https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032865#comment-15032865 ] Chris Nauroth commented on HDFS-9038: - Thanks for the further reviews. I'm catching up on patch v005 now. # The latest {{getNonDfsUsed}} switched to using {{File#getFreeSpace}}. However, the {{getAvailable}} calculation uses {{File#getUsableSpace}} via {{DF#getAvailable}}. The non-DFS used calculation prior to the HDFS-5215 patch also would have been using {{File#getUsableSpace}}. I think we should stick with {{File#getUsableSpace}} here (or {{DF#getAvailable}} for symmetry with the pre-HDFS-5215 code). # The latest {{getNonDfsUsed}} does not include {{reservedForReplicas}}. I think it should, since the {{reservedForReplicas}} amount is effectively in use by HDFS. # I think we should cap the returned value to 0 as a matter of defensive coding against negative values. There could be a possibility of race conditions in between pulling the individual data items, resulting in an unexpected negative total. > Reserved space is erroneously counted towards non-DFS used. > --- > > Key: HDFS-9038 > URL: https://issues.apache.org/jira/browse/HDFS-9038 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Brahma Reddy Battula > Attachments: HDFS-9038-002.patch, HDFS-9038-003.patch, > HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038.patch > > > HDFS-5215 changed the DataNode volume available space calculation to consider > the reserved space held by the {{dfs.datanode.du.reserved}} configuration > property. As a side effect, reserved space is now counted towards non-DFS > used. I don't believe it was intentional to change the definition of non-DFS > used. This issue proposes restoring the prior behavior: do not count > reserved space towards non-DFS used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9365) Balaner does not work with the HDFS-6376 HA setup
[ https://issues.apache.org/jira/browse/HDFS-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032982#comment-15032982 ] Hadoop QA commented on HDFS-9365: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 7 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 43s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 137m 42s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork | | | hadoop.hdfs.server.datanode.TestBlockScanner | | JDK v1.7.0_85 Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12773635/h9365_20151120.patch | | JIRA Issue | HDFS-9365 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 9e495f2e2ec9
[jira] [Updated] (HDFS-6533) intermittent org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest failure
[ https://issues.apache.org/jira/browse/HDFS-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-6533: -- Attachment: HDFS-6533.003.patch Thanks [~arpitagarwal] for the comments. Yes that looks to be a better idea. I am attaching rev03 that follows your suggestion. > intermittent > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest > failure > -- > > Key: HDFS-6533 > URL: https://issues.apache.org/jira/browse/HDFS-6533 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-6533.001.patch, HDFS-6533.002.patch, > HDFS-6533.003.patch > > > Per https://builds.apache.org/job/Hadoop-Hdfs-trunk/1774/testReport, the > following test failed. However, local rerun is successful. > {code} > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality > Error Message > Wanted but not invoked: > datanodeProtocolClientSideTranslatorPB.registerDatanode( > > ); > -> at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175) > Actually, there were zero interactions with this mock. > Stacktrace > org.mockito.exceptions.verification.WantedButNotInvoked: > Wanted but not invoked: > datanodeProtocolClientSideTranslatorPB.registerDatanode( > > ); > -> at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175) > Actually, there were zero interactions with this mock. > at > org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionality(TestBPOfferService.java:175) > Standard Output > 2014-06-14 12:42:08,723 INFO datanode.DataNode > (SimulatedFSDataset.java:registerMBean(968)) - Registered FSDatasetState MBean > 2014-06-14 12:42:08,730 INFO datanode.DataNode > (BPServiceActor.java:run(805)) - Block pool (Datanode Uuid > unassigned) service to 0.0.0.0/0.0.0.0:0 starting to offer service > 2014-06-14 12:42:08,730 DEBUG datanode.DataNode > (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool > (Datanode Uuid unassigned) service to 0.0.0.0/0.0.0.0:0 received > versionRequest response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid > 2014-06-14 12:42:08,731 INFO datanode.DataNode > (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid > null) service to 0.0.0.0/0.0.0.0:0 beginning handshake with NN > 2014-06-14 12:42:08,731 INFO datanode.DataNode > (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid > (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0 successfully registered > with NN > 2014-06-14 12:42:08,732 INFO datanode.DataNode > (BPServiceActor.java:offerService(637)) - For namenode 0.0.0.0/0.0.0.0:0 > using DELETEREPORT_INTERVAL of 30 msec BLOCKREPORT_INTERVAL of > 2160msec CACHEREPORT_INTERVAL of 1msec Initial delay: 0msec; > heartBeatInterval=3000 > 2014-06-14 12:42:08,732 DEBUG datanode.DataNode > (BPServiceActor.java:sendHeartBeat(562)) - Sending heartbeat with 1 storage > reports from service actor: Block pool fake bpid (Datanode Uuid null) service > to 0.0.0.0/0.0.0.0:0 > 2014-06-14 12:42:08,734 INFO datanode.DataNode > (BPServiceActor.java:blockReport(498)) - Sent 1 blockreports 0 blocks total. > Took 1 msec to generate and 0 msecs for RPC and NN processing. Got back > commands none > 2014-06-14 12:42:08,738 INFO datanode.DataNode > (BPServiceActor.java:run(805)) - Block pool fake bpid (Datanode Uuid null) > service to 0.0.0.0/0.0.0.0:1 starting to offer service > 2014-06-14 12:42:08,739 DEBUG datanode.DataNode > (BPServiceActor.java:retrieveNamespaceInfo(170)) - Block pool fake bpid > (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 received versionRequest > response: lv=-57;cid=fake cluster;nsid=1;c=0;bpid=fake bpid > 2014-06-14 12:42:08,739 INFO datanode.DataNode > (BPServiceActor.java:register(765)) - Block pool fake bpid (Datanode Uuid > null) service to 0.0.0.0/0.0.0.0:1 beginning handshake with NN > 2014-06-14 12:42:08,740 INFO datanode.DataNode > (BPServiceActor.java:register(778)) - Block pool Block pool fake bpid > (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:1 successfully registered > with NN > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors
[ https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032997#comment-15032997 ] Hadoop QA commented on HDFS-9449: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 47s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 57s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 53s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 50s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s {color} | {color:red} Patch generated 57 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 180m 59s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.security.TestPermission | | | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits | | JDK v1.7.0_85 Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.namenode.TestCacheDirectives | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.security.TestPermission | \\ \\ || Subsystem
[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException
[ https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032998#comment-15032998 ] Hadoop QA commented on HDFS-9429: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 48s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 55s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 140m 37s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots | | JDK v1.7.0_85 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774916/HDFS-9429.003.patch | | JIRA Issue | HDFS-9429 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ef2f02a59f7c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Updated] (HDFS-9336) deleteSnapshot throws NPE when snapshotname is null
[ https://issues.apache.org/jira/browse/HDFS-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-9336: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed this to trunk, branch-2, and branch-2.8. Thanks [~brahmareddy] for the contribution! > deleteSnapshot throws NPE when snapshotname is null > --- > > Key: HDFS-9336 > URL: https://issues.apache.org/jira/browse/HDFS-9336 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9336-002.patch, HDFS-9336-003.patch, > HDFS-9336-004.patch, HDFS-9336.patch > > > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$DeleteSnapshotRequestProto$Builder.setSnapshotName(ClientNamenodeProtocolProtos.java:17509) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.deleteSnapshot(ClientNamenodeProtocolTranslatorPB.java:1005) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy15.deleteSnapshot(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.deleteSnapshot(DFSClient.java:2106) > at > org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1660) > at > org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.deleteSnapshot(DistributedFileSystem.java:1677) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testWebHdfsAllowandDisallowSnapshots(TestWebHDFS.java:380) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) > at > org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9474) TestPipelinesFailover would fail if ifconfig is not available
[ https://issues.apache.org/jira/browse/HDFS-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-9474: - Status: Patch Available (was: In Progress) > TestPipelinesFailover would fail if ifconfig is not available > - > > Key: HDFS-9474 > URL: https://issues.apache.org/jira/browse/HDFS-9474 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: John Zhuge > Attachments: HDFS-9474.001.patch, HDFS-9474.002.patch > > > HDFS-6693 introduced some debug message to debug why when > TestPipelinesFailover fails. > HDFS-9438 restricted the debug message to Linux/Mac/Solaris. However, the > test would fail when printing debug message if "ifconfig" command is not > available in certain environment. > This is not quite right. The test should not fail due to the debug message > printing. We should catch any exception thrown from the code that prints > debug message, and issue a warning message. > Suggest to make this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9485) Make BlockManager#removeFromExcessReplicateMap accept BlockInfo instead of Block
[ https://issues.apache.org/jira/browse/HDFS-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032950#comment-15032950 ] Hadoop QA commented on HDFS-9485: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 18s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 20s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 197m 3s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure110 | | | hadoop.hdfs.TestFileCreationDelete | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.qjournal.TestSecureNNWithQJM | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | |
[jira] [Commented] (HDFS-6533) intermittent org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testBasicFunctionalitytest failure
[ https://issues.apache.org/jira/browse/HDFS-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032969#comment-15032969 ] Hadoop QA commented on HDFS-6533: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs (total was 23, now 24). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 55s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 1s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 56s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 58 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 199m 40s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Increment of volatile field org.apache.hadoop.hdfs.server.datanode.BPOfferService.registeredActors in org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPServiceActor, DatanodeRegistration) At BPOfferService.java:in org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPServiceActor, DatanodeRegistration) At BPOfferService.java:[line 371] | | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | |
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.025.patch Thanks [~jingzhao] for the review. I revisited the TODO comment and I also think we can remove it safely. Nice catch. The v25 patch is to address this. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, > HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors
[ https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033001#comment-15033001 ] Anu Engineer commented on HDFS-9449: None of the test failures are related to this patch. > DiskBalancer : Add connectors > - > > Key: HDFS-9449 > URL: https://issues.apache.org/jira/browse/HDFS-9449 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9449-HDFS-1312.001.patch, > HDFS-9449-HDFS-1312.002.patch, HDFS-9449-HDFS-1312.003.patch > > > Connectors allow disk balancer data models to connect to an existing cluster > - Namenode or to a json file which describes the cluster. This is used for > discovering the physical layout of the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9449) DiskBalancer : Add connectors
[ https://issues.apache.org/jira/browse/HDFS-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032650#comment-15032650 ] Tsz Wo Nicholas Sze commented on HDFS-9449: --- - DBNameNodeConnector.connector should final. Then, it will never null so that we don't need to check null for it. - DBNameNodeConnector.clusterURI is not used. Should we remove it? - If two DiskBalancer's are running at the same time, would they somehow balance the same datanode? {code} // we don't care how many instances of disk balancers run. // The admission is controlled at the data node, where we will // execute only one plan at a given time. NameNodeConnector.setWrite2IdFile(false); {code} - The code with {code} Preconditions.checkArgument(x != null); {code} should be replaced by {code} Preconditions.checkNotNull(x); {code} - Only the path in JsonNodeConnector.clusterURI is used. Should clusterURI be replaced by something like clusterFilePath? > DiskBalancer : Add connectors > - > > Key: HDFS-9449 > URL: https://issues.apache.org/jira/browse/HDFS-9449 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9449-HDFS-1312.001.patch, > HDFS-9449-HDFS-1312.002.patch > > > Connectors allow disk balancer data models to connect to an existing cluster > - Namenode or to a json file which describes the cluster. This is used for > discovering the physical layout of the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8957) Consolidate client striping input stream codes for stateful read and positional read
[ https://issues.apache.org/jira/browse/HDFS-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8957: Fix Version/s: (was: HDFS-7285) > Consolidate client striping input stream codes for stateful read and > positional read > > > Key: HDFS-8957 > URL: https://issues.apache.org/jira/browse/HDFS-8957 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HDFS-8957-v1.patch > > > Currently we have different implementations for client striping read, having > both *StatefulStripeReader* and *PositionStripeReader*. I attempted to > consolidate the two implementations into one, and it results in much simpler > codes, and also better performance. Now in both read paths, it will: > * Use pooled ByteBuffers, as currently stateful read does; > * Read directly into application's buffer, as currently positional read does; > * Try to align and merge multiple stripes, as currently positional read does; > * Use *ECChunk* version decode API. > The resultant *StripeReader* is approaching very near now to the ideal state > desired by next step, employing *ErasureCoder* API instead of > *RawErasureCoder* API. > Will upload an initial patch to illustrate the rough change, even though it > depends on other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports
[ https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032704#comment-15032704 ] Mingliang Liu commented on HDFS-9484: - Thanks [~cmccabe] for your confirmation. I updated the jira description to add another potential bug that makes the {{BlockReportStats}} send empty block reports. Would you help me check that out as well? > NNThroughputBenchmark$BlockReportStats should not send empty block reports > -- > > Key: HDFS-9484 > URL: https://issues.apache.org/jira/browse/HDFS-9484 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > There are two potential bugs that make the > {{NNThroughputBenchmark$BlockReportStats}} send empty block reports. > # In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the > {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct > the block report list by encoding generated {{blocks}} in test. > # {{TinyDatanode#blocks}} is an empty ArrayList with initial capacity. In > {{TinyDatanode#addBlock()}} first statement, the {{if(nrBlocks == > blocks.size()) {}} will always be true. We should either fill the blocks with > dummy report in {{TinyDatanode()}} constructor, or use initial capacity > instead of {{blocks.size()}} in the above _if_ statement (we should replace > {{ArrayList#set}} with {{ArrayList#add}} as well). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8957) Consolidate client striping input stream codes for stateful read and positional read
[ https://issues.apache.org/jira/browse/HDFS-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8957: Component/s: erasure-coding > Consolidate client striping input stream codes for stateful read and > positional read > > > Key: HDFS-8957 > URL: https://issues.apache.org/jira/browse/HDFS-8957 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HDFS-8957-v1.patch > > > Currently we have different implementations for client striping read, having > both *StatefulStripeReader* and *PositionStripeReader*. I attempted to > consolidate the two implementations into one, and it results in much simpler > codes, and also better performance. Now in both read paths, it will: > * Use pooled ByteBuffers, as currently stateful read does; > * Read directly into application's buffer, as currently positional read does; > * Try to align and merge multiple stripes, as currently positional read does; > * Use *ECChunk* version decode API. > The resultant *StripeReader* is approaching very near now to the ideal state > desired by next step, employing *ErasureCoder* API instead of > *RawErasureCoder* API. > Will upload an initial patch to illustrate the rough change, even though it > depends on other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9470) Encryption zone on root not loaded from fsimage after NN restart
[ https://issues.apache.org/jira/browse/HDFS-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032707#comment-15032707 ] Xiao Chen commented on HDFS-9470: - Thanks [~andrew.wang] for committing and resolving cherry-pick conflicts. I reviewed the commits to 2.6 and 2.7 branches, LGTM +1. Also thanks to everyone for the review and comments. > Encryption zone on root not loaded from fsimage after NN restart > > > Key: HDFS-9470 > URL: https://issues.apache.org/jira/browse/HDFS-9470 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Fix For: 2.7.2, 2.6.3 > > Attachments: HDFS-9470.001.patch, HDFS-9470.002.patch, > HDFS-9470.003.patch > > > When restarting namenode, the encryption zone for {{rootDir}} is not loaded > correctly from fsimage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications
[ https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032723#comment-15032723 ] Jing Zhao commented on HDFS-9381: - Thanks for the discussion, Uma and Zhe! bq. To determine whether the optimization justifies the added complexity, I think we can create a more concrete example. yeah, I agree maybe a more concrete example and some perf numbers will help us understand the optimization better. bq. I think besides reducing locking contention, this change also speeds up the recovery of non-striping blocks. E.g., when a rack fails, there could be a lot of striped block recovery work waiting. They could block regular recovery tasks. When we have a lot of missing blocks/replicas (e.g., caused by DataNode failures or even rack failure), since in each iteration the replication monitor only handles limited number of blocks, some iterations may be wasted by checking this type of striped blocks. However, it is also possible that because of the longer processing time, there is higher chance for the striped blocks to be updated in the UC queue before being processed by the replication monitor for the first time. Also the striped blocks are more likely to be replicated across multiple racks, a single rack failure may only cause a single internal block missing for a striped block group. So feels like the scenarios are more complicated here. > When same block came for replication for Striped mode, we can move that block > to PendingReplications > > > Key: HDFS-9381 > URL: https://issues.apache.org/jira/browse/HDFS-9381 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, > HDFS-9381.00.patch, HDFS-9381.01.patch > > > Currently I noticed that we are just returning null if block already exists > in pendingReplications in replication flow for striped blocks. > {code} > if (block.isStriped()) { > if (pendingNum > 0) { > // Wait the previous recovery to finish. > return null; > } > {code} > Here if we just return null and if neededReplications contains only fewer > blocks(basically by default if less than numliveNodes*2), then same blocks > can be picked again from neededReplications from next loop as we are not > removing element from neededReplications. Since this replication process need > to take fsnamesystmem lock and do, we may spend some time unnecessarily in > every loop. > So my suggestion/improvement is: > Instead of just returning null, how about incrementing pendingReplications > for this block and remove from neededReplications? and also another point to > consider here is, to add into pendingReplications, generally we need target > and it is nothing but to which node we issued replication command. Later when > after replication success and DN reported it, block will be removed from > pendingReplications from NN addBlock. > So since this is newly picked block from neededReplications, we would not > have selected target yet. So which target to be passed to pendingReplications > if we add this block? One Option I am thinking is, how about just passing > srcNode itself as target for this special condition? So, anyway if the block > is really missed, srcNode will not report it. So this block will not be > removed from pending replications, so that when it is timed out, it will be > considered for replication again and that time it will find actual target to > replicate while processing as part of regular replication flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9487) libhdfs++ Enable builds with no compiler optimizations
[ https://issues.apache.org/jira/browse/HDFS-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032738#comment-15032738 ] Bob Hansen commented on HDFS-9487: -- http://unix.stackexchange.com/questions/187455/how-to-compile-without-optimizations-o0-using-cmake is a pattern to follow that might help. > libhdfs++ Enable builds with no compiler optimizations > -- > > Key: HDFS-9487 > URL: https://issues.apache.org/jira/browse/HDFS-9487 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > > The default build configuration uses -02 -g . To make > debugging easier it would be really nice to be able to produce builds with > -O0. > I haven't found an existing flag to pass to maven or cmake to accomplish > this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy
[ https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032747#comment-15032747 ] Zhe Zhang commented on HDFS-8647: - [~brahmareddy] [~mingma] [~walter.k.su] I wonder whether we should consider pushing this to branch-2.7 and branch-2.6. Maybe after the currently planned 2.7.2 and 2.6.3 releases. Doing so will enable the inclusion of bug fixes HDFS-9313 and HDFS-9314 in 2.6.x and 2.7.x. I worked with [~xiaochen] offline along this direction. The main challenge is from HDFS-8823, which should have been done in a feature branch but was committed to branch-2 -- so I don't think we should push that one to branch-2.6/2.7. If we reach an agreement here we can create a branch-2.6/2.7 patch for this JIRA. Thanks. > Abstract BlockManager's rack policy into BlockPlacementPolicy > - > > Key: HDFS-8647 > URL: https://issues.apache.org/jira/browse/HDFS-8647 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, > HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, > HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, > HDFS-8647-008.patch, HDFS-8647-009.patch > > > Sometimes we want to have namenode use alternative block placement policy > such as upgrade domains in HDFS-7541. > BlockManager has built-in assumption about rack policy in functions such as > useDelHint, blockHasEnoughRacks. That means when we have new block placement > policy, we need to modify BlockManager to account for the new policy. Ideally > BlockManager should ask BlockPlacementPolicy object instead. That will allow > us to provide new BlockPlacementPolicy without changing BlockManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots
[ https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032762#comment-15032762 ] Chris Nauroth commented on HDFS-8986: - I see {{-du}} mentioned here. I recommend that we support the same functionality for {{-count}} too. > Add option to -du to calculate directory space usage excluding snapshots > > > Key: HDFS-8986 > URL: https://issues.apache.org/jira/browse/HDFS-8986 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Gautam Gopalakrishnan >Assignee: Jagadesh Kiran N > > When running {{hadoop fs -du}} on a snapshotted directory (or one of its > children), the report includes space consumed by blocks that are only present > in the snapshots. This is confusing for end users. > {noformat} > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -createSnapshot /tmp/parent snap1 > Created snapshot /tmp/parent/.snapshot/snap1 > $ hadoop fs -rm -skipTrash /tmp/parent/sub1/* > ... > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -deleteSnapshot /tmp/parent snap1 > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 0 0 /tmp/parent > 0 0 /tmp/parent/sub1 > {noformat} > It would be helpful if we had a flag, say -X, to exclude any snapshot related > disk usage in the output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots
[ https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032786#comment-15032786 ] Gautam Gopalakrishnan commented on HDFS-8986: - I feel I should invert the request in this jira. The behaviour of {{-du}} should be the same regardless of whether snapshots are present or not. We should add a flag to include snapshot volume, rather than exclude. Possibly the same change is required for {{-count}}. [~cnauroth] and [~qwertymaniac], what do you think? [~jagadesh.kiran] I can take this jira if you're busy with other work. > Add option to -du to calculate directory space usage excluding snapshots > > > Key: HDFS-8986 > URL: https://issues.apache.org/jira/browse/HDFS-8986 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Gautam Gopalakrishnan >Assignee: Jagadesh Kiran N > > When running {{hadoop fs -du}} on a snapshotted directory (or one of its > children), the report includes space consumed by blocks that are only present > in the snapshots. This is confusing for end users. > {noformat} > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -createSnapshot /tmp/parent snap1 > Created snapshot /tmp/parent/.snapshot/snap1 > $ hadoop fs -rm -skipTrash /tmp/parent/sub1/* > ... > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 799.7 M 2.3 G /tmp/parent > 799.7 M 2.3 G /tmp/parent/sub1 > $ hdfs dfs -deleteSnapshot /tmp/parent snap1 > $ hadoop fs -du -h -s /tmp/parent /tmp/parent/* > 0 0 /tmp/parent > 0 0 /tmp/parent/sub1 > {noformat} > It would be helpful if we had a flag, say -X, to exclude any snapshot related > disk usage in the output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9228) libhdfs++ should respect NN retry configuration settings
[ https://issues.apache.org/jira/browse/HDFS-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032635#comment-15032635 ] James Clampffer commented on HDFS-9228: --- Looks good to me, just found 2 small things worth fixing. I'm planning on committing HDFS-9144 before this, please let me know if it would be less painful to hold off on HDFS-9144 until this gets in. -RetryPolicy should probably have a virtual destructor, or maybe a comment saying members can only be POD types. I'd prefer the virtual destructor approach. -In rpc_connection.cc line 37 "NO_RETRY" should be "kNoRetry" to keep consistent with the naming conventions for constants. > libhdfs++ should respect NN retry configuration settings > > > Key: HDFS-9228 > URL: https://issues.apache.org/jira/browse/HDFS-9228 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9228.HDFS-8707.001.patch, > HDFS-9228.HDFS-8707.002.patch, HDFS-9228.HDFS-8707.003.patch, > HDFS-9228.HDFS-8707.004.patch, HDFS-9228.HDFS-8707.005.patch, > HDFS-9228.HDFS-8707.006.patch > > > Handle the use case of temporary network or NN hiccups and have a > configurable number of retries for NN operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports
[ https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9484: Description: In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct the block report list by encoding generated {{blocks}} in test. Meanwhile, {{TinyDatanode#blocks}} is an empty ArrayList with initial capacity. In {{TinyDatanode#addBlock()}} first statement, the {{if(nrBlocks == blocks.size()) {}} will always be true. We should either fill the blocks with dummy report in {{TinyDatanode()}} constructor, or use initial capacity instead of {{blocks.size()}} in the above _if_ statement (we should replace ArrayList#set with ArrayList#add as well). There are two potential bugs that make the was:In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct the block report list by encoding generated {{blocks}} in test. > NNThroughputBenchmark$BlockReportStats should not send empty block reports > -- > > Key: HDFS-9484 > URL: https://issues.apache.org/jira/browse/HDFS-9484 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > > In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the > {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct > the block report list by encoding generated {{blocks}} in test. > Meanwhile, {{TinyDatanode#blocks}} is an empty ArrayList with initial > capacity. In {{TinyDatanode#addBlock()}} first statement, the {{if(nrBlocks > == blocks.size()) {}} will always be true. We should either fill the blocks > with dummy report in {{TinyDatanode()}} constructor, or use initial capacity > instead of {{blocks.size()}} in the above _if_ statement (we should replace > ArrayList#set with ArrayList#add as well). > There are two potential bugs that make the -- This message was sent by Atlassian JIRA (v6.3.4#6332)