[jira] [Commented] (HDFS-5170) BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging
[ https://issues.apache.org/jira/browse/HDFS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760912#comment-13760912 ] Hadoop QA commented on HDFS-5170: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601941/HDFS-5170-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4939//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4939//console This message is automatically generated. > BlockPlacementPolicyDefault uses the wrong classname when alerting to enable > debug logging > -- > > Key: HDFS-5170 > URL: https://issues.apache.org/jira/browse/HDFS-5170 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Trivial > Attachments: HDFS-5170-1.patch > > > {code} > private static final String enableDebugLogging = > "For more information, please enable DEBUG log level on " > + LOG.getClass().getName(); > {code} > This inserts the LOG's class rather than BlockPlacementPolicy's class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross rack/node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760872#comment-13760872 ] Junping Du commented on HDFS-5168: -- That's a good idea. Nichola! I think you are mentioning case for VMs running on shared storages, i.e. SAN. Isn't it? Previously, DAS (HDD) is considered as default and the only one backed storage type for Hadoop even in virtualization case. Now, we are addressing different storage tiers (including SSD, remote storage, etc.) under HDFS-2832. I had similar comments there that storage failure group should be addressed when we are enabling remote storage. I would prefer the first approach especially it could be easier after enabling storage type awareness. The second approach will break some basic assumptions of Hadoop - hierarchical network topology, which seems unnecessary for me. Thoughts? > BlockPlacementPolicy does not work for cross rack/node group dependencies > - > > Key: HDFS-5168 > URL: https://issues.apache.org/jira/browse/HDFS-5168 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Nikola Vujic >Priority: Critical > > Block placement policies do not work for cross rack/node group dependencies. > In reality this is needed when compute servers and storage fall in two > independent fault domains, then both BlockPlacementPolicyDefault and > BlockPlacementPolicyWithNodeGroup are not able to provide proper block > placement. > Let's suppose that we have Hadoop cluster with one rack with two servers, and > we run 2 VMs per server. Node group topology for this cluster would be: > server1-vm1 -> /d1/r1/n1 > server1-vm2 -> /d1/r1/n1 > server2-vm1 -> /d1/r1/n2 > server2-vm2 -> /d1/r1/n2 > This is working fine as long as server and storage fall into the same fault > domain but if storage is in a different fault domain from the server, we will > not be able to handle that. For example, if storage of server1-vm1 is in the > same fault domain as storage of server2-vm1, then we must not place two > replicas on these two nodes although they are in different node groups. > Two possible approaches: > - One approach would be to define cross rack/node group dependencies and to > use them when excluding nodes from the search space. This looks as the > cleanest way to fix this as it requires minor changes in the > BlockPlacementPolicy classes. > - Other approach would be to allow nodes to fall in more than one node group. > When we chose a node to hold a replica we have to exclude from the search > space all nodes from the node groups where the chosen node belongs. This > approach may require major changes in the NetworkTopology. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5170) BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging
[ https://issues.apache.org/jira/browse/HDFS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5170: -- Affects Version/s: 2.1.0-beta Status: Patch Available (was: Open) > BlockPlacementPolicyDefault uses the wrong classname when alerting to enable > debug logging > -- > > Key: HDFS-5170 > URL: https://issues.apache.org/jira/browse/HDFS-5170 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Trivial > Attachments: HDFS-5170-1.patch > > > {code} > private static final String enableDebugLogging = > "For more information, please enable DEBUG log level on " > + LOG.getClass().getName(); > {code} > This inserts the LOG's class rather than BlockPlacementPolicy's class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5169) hdfs.c: translateZCRException: null pointer deref when translating some exceptions
Colin Patrick McCabe created HDFS-5169: -- Summary: hdfs.c: translateZCRException: null pointer deref when translating some exceptions Key: HDFS-5169 URL: https://issues.apache.org/jira/browse/HDFS-5169 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: HDFS-4949 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor hdfs.c: translateZCRException: there is a null pointer deref when translating some exceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5170) BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging
[ https://issues.apache.org/jira/browse/HDFS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5170: -- Attachment: HDFS-5170-1.patch Trivial patch attached, compile tested. > BlockPlacementPolicyDefault uses the wrong classname when alerting to enable > debug logging > -- > > Key: HDFS-5170 > URL: https://issues.apache.org/jira/browse/HDFS-5170 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Trivial > Attachments: HDFS-5170-1.patch > > > {code} > private static final String enableDebugLogging = > "For more information, please enable DEBUG log level on " > + LOG.getClass().getName(); > {code} > This inserts the LOG's class rather than BlockPlacementPolicy's class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5170) BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging
[ https://issues.apache.org/jira/browse/HDFS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760822#comment-13760822 ] Colin Patrick McCabe commented on HDFS-5170: +1 pending jenkins > BlockPlacementPolicyDefault uses the wrong classname when alerting to enable > debug logging > -- > > Key: HDFS-5170 > URL: https://issues.apache.org/jira/browse/HDFS-5170 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Trivial > Attachments: HDFS-5170-1.patch > > > {code} > private static final String enableDebugLogging = > "For more information, please enable DEBUG log level on " > + LOG.getClass().getName(); > {code} > This inserts the LOG's class rather than BlockPlacementPolicy's class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5120) add command-line support for manipulating cache pools
[ https://issues.apache.org/jira/browse/HDFS-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5120: --- Attachment: HDFS-5120-caching.002.patch > add command-line support for manipulating cache pools > - > > Key: HDFS-5120 > URL: https://issues.apache.org/jira/browse/HDFS-5120 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: HDFS-4949 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5120-caching.001.patch, HDFS-5120-caching.002.patch > > > We should add command-line support for creating, removing, and listing cache > directives and manipulating cache pools. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5120) add command-line support for manipulating cache pools
[ https://issues.apache.org/jira/browse/HDFS-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5120: --- Attachment: (was: HDFS-5163-caching.004.patch) > add command-line support for manipulating cache pools > - > > Key: HDFS-5120 > URL: https://issues.apache.org/jira/browse/HDFS-5120 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: HDFS-4949 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5120-caching.001.patch, HDFS-5120-caching.002.patch > > > We should add command-line support for creating, removing, and listing cache > directives and manipulating cache pools. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5170) BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging
Andrew Wang created HDFS-5170: - Summary: BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging Key: HDFS-5170 URL: https://issues.apache.org/jira/browse/HDFS-5170 Project: Hadoop HDFS Issue Type: Bug Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial {code} private static final String enableDebugLogging = "For more information, please enable DEBUG log level on " + LOG.getClass().getName(); {code} This inserts the LOG's class rather than BlockPlacementPolicy's class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5120) add command-line support for manipulating cache pools
[ https://issues.apache.org/jira/browse/HDFS-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5120: --- Attachment: HDFS-5163-caching.004.patch I rebased this on the current branch, and added some help text about mode and weight. The description of mode is more detailed now. I realize the description of weight is inadequate, but we plan on adding many more resource management tunables, so lets leave it in for now as a placeholder. > add command-line support for manipulating cache pools > - > > Key: HDFS-5120 > URL: https://issues.apache.org/jira/browse/HDFS-5120 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: HDFS-4949 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5120-caching.001.patch, HDFS-5163-caching.004.patch > > > We should add command-line support for creating, removing, and listing cache > directives and manipulating cache pools. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap
[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760804#comment-13760804 ] Colin Patrick McCabe commented on HDFS-4953: I've been thinking about this, and I think it might be possible to improve on the current API. Maybe all we need is something like this: {code} in DFSInputStream: ZeroBuffer readZero(ByteBuffer fallback, int maxLength); ZeroBuffer: implements Closeable (for close) implements eof() (returns true if there are no more bytes to read) implements all ByteBuffer methods by forwarding them to the enclosed ByteBuffer {code} This API would be implemented for every filesystem, not just HDFS. The constraints here would be: * maxLength >= 0 * you can't reuse a fallback buffer until you close the associated ZeroBuffer (we can enforce this by throwing an exception in this case) * ZeroBuffers are immutable once created-- until you call close on them. This gets rid of a few of the awkward issues with the current API, which I think are: * the current API requires users to special-case HDFS (since other FSes throw ZeroCopyUnavailableException) * the current API shares the file position between the cursors and the stream, which is unintuitive. * the current API puts the read call inside the cursor object, which is different than the other read methods. > enable HDFS local reads via mmap > > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: HDFS-4949 > > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap
[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760761#comment-13760761 ] Colin Patrick McCabe commented on HDFS-4953: Your proposed API doesn't address one of the big asks we had when designing ZCR, which is to provide a mechanism for notifying the user that he cannot get an mmap. As I mentioned earlier, for performance reasons, many users who might like to have access to a 128 MB mmap segment do not want to copy into a 128MB backing buffer. Doing such a large copy would blow the L2 cache (and possibly the page cache), and rather than improving performance, might degrade it. Similarly, users don't want to get multiple byte buffers back-- the big advantage of mmap is getting a single buffer back (in the cases where that's possible). What if the user wants to use a direct byte buffer as his fallback buffer? With the current code, that is easy-- I just call setFallbackBuffer(ByteBuffer.allocateDirect(...)). With your proposed API, there's no way to do this. Creating a new ByteBuffer for each read is going to be slower than reusing the same ByteBuffer-- especially for direct ByteBuffers. Sure, we could have some kind of ByteBuffer cache inside the FSDataInputStream, but that's going to be very complicated. What if someone needs a ByteBuffer of size 100 but we only have ones of size 10 and 900 in the cache? Do we use the big one for the small read or leave it around? How long do we cache them? Do we prefer to the direct ones? And so on. Really, the only design that makes sense is having the user pass in the fallback buffer. We do not want to be re-inventing malloc inside FSDataInputStream. The design principles of the current API are: * some users want a fallback path, and some don't. We have to satisfy both. * we don't want to manage buffers inside FSDataInputStream. It's a messy and hard problem with no optimal solutions that fit all cases. * nobody wants to receive more than one buffer in response to a read. * most programmers don't correctly handle short reads, so there should be an option to disable them. One thing that we could and should do is provide a generic fallback path that is independent of filesystem. > enable HDFS local reads via mmap > > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: HDFS-4949 > > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5041) Add the time of last heartbeat to dead server Web UI
[ https://issues.apache.org/jira/browse/HDFS-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760748#comment-13760748 ] Hadoop QA commented on HDFS-5041: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601909/HDFS-5041.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4938//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4938//console This message is automatically generated. > Add the time of last heartbeat to dead server Web UI > > > Key: HDFS-5041 > URL: https://issues.apache.org/jira/browse/HDFS-5041 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Minor > Attachments: HDFS-5041.patch, NameNode-dfsnodelist-dead.png > > > In Live Server page, there is a column 'Last Contact'. > On the dead server page, similar column can be added which shows when the > last heartbeat came from the respective dead node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5086) Support RPCSEC_GSS authentication in NFSv3 gateway
[ https://issues.apache.org/jira/browse/HDFS-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5086: - Assignee: Jing Zhao > Support RPCSEC_GSS authentication in NFSv3 gateway > -- > > Key: HDFS-5086 > URL: https://issues.apache.org/jira/browse/HDFS-5086 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0 >Reporter: Brandon Li >Assignee: Jing Zhao > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5067) Support symlink operations
[ https://issues.apache.org/jira/browse/HDFS-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5067: - Assignee: Brandon Li > Support symlink operations > -- > > Key: HDFS-5067 > URL: https://issues.apache.org/jira/browse/HDFS-5067 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0 >Reporter: Brandon Li >Assignee: Brandon Li > > Given the symlink issues(e.g., HDFS-4765) are getting fixed. NFS can support > the symlinke related requests, which includes NFSv3 calls SYMLINK and > READLINK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5085) Support Kerberos authentication in NFSv3 gateway
[ https://issues.apache.org/jira/browse/HDFS-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5085: - Assignee: Jing Zhao > Support Kerberos authentication in NFSv3 gateway > > > Key: HDFS-5085 > URL: https://issues.apache.org/jira/browse/HDFS-5085 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0 >Reporter: Brandon Li >Assignee: Jing Zhao > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap
[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760682#comment-13760682 ] Owen O'Malley commented on HDFS-4953: - Colin, please read my suggestion and my analysis of the difference before commenting. The simplified API absolutely provides a means to releasing the ByteBuffer and yet it is 2 lines long instead of 20. Furthermore, I didn't even realize that I was supposed to close the zero copy cursor, since it just came in from closable. My complaint stands. The API as currently in this branch is very error-prone and difficult to explain. Using it is difficult and requires complex handling including exception handlers to handle arbitrary file systems. > enable HDFS local reads via mmap > > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: HDFS-4949 > > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5041) Add the time of last heartbeat to dead server Web UI
[ https://issues.apache.org/jira/browse/HDFS-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760627#comment-13760627 ] Ted Yu commented on HDFS-5041: -- Looks good. > Add the time of last heartbeat to dead server Web UI > > > Key: HDFS-5041 > URL: https://issues.apache.org/jira/browse/HDFS-5041 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > Attachments: NameNode-dfsnodelist-dead.png > > > In Live Server page, there is a column 'Last Contact'. > On the dead server page, similar column can be added which shows when the > last heartbeat came from the respective dead node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5041) Add the time of last heartbeat to dead server Web UI
[ https://issues.apache.org/jira/browse/HDFS-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-5041: - Target Version/s: 3.0.0 Affects Version/s: 3.0.0 Status: Patch Available (was: Open) > Add the time of last heartbeat to dead server Web UI > > > Key: HDFS-5041 > URL: https://issues.apache.org/jira/browse/HDFS-5041 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Minor > Attachments: HDFS-5041.patch, NameNode-dfsnodelist-dead.png > > > In Live Server page, there is a column 'Last Contact'. > On the dead server page, similar column can be added which shows when the > last heartbeat came from the respective dead node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5041) Add the time of last heartbeat to dead server Web UI
[ https://issues.apache.org/jira/browse/HDFS-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-5041: - Attachment: HDFS-5041.patch I attach the patch which I showed in the image. > Add the time of last heartbeat to dead server Web UI > > > Key: HDFS-5041 > URL: https://issues.apache.org/jira/browse/HDFS-5041 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > Attachments: HDFS-5041.patch, NameNode-dfsnodelist-dead.png > > > In Live Server page, there is a column 'Last Contact'. > On the dead server page, similar column can be added which shows when the > last heartbeat came from the respective dead node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5041) Add the time of last heartbeat to dead server Web UI
[ https://issues.apache.org/jira/browse/HDFS-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-5041: - Attachment: NameNode-dfsnodelist-dead.png I attach a prototype image of dead datanodes list. > Add the time of last heartbeat to dead server Web UI > > > Key: HDFS-5041 > URL: https://issues.apache.org/jira/browse/HDFS-5041 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > Attachments: NameNode-dfsnodelist-dead.png > > > In Live Server page, there is a column 'Last Contact'. > On the dead server page, similar column can be added which shows when the > last heartbeat came from the respective dead node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760509#comment-13760509 ] Arpit Agarwal commented on HDFS-2832: - Thanks for the feedback Eric. Both of these would be good to have and came up during design discussions but we have not addressed either. For #2, in addition to your points there are other locations where storage directories are assumed to be File-addressable. I am not sure of the amount of work involved here. Multiple replicas per Datanode looks easier and can be done on top of the Heterogeneous Storage work. We would need phase 1 of the feature to support multiple storages. > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-5163) miscellaneous cache pool RPC fixes
[ https://issues.apache.org/jira/browse/HDFS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-5163. Resolution: Fixed committed to branch, thanks > miscellaneous cache pool RPC fixes > -- > > Key: HDFS-5163 > URL: https://issues.apache.org/jira/browse/HDFS-5163 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-5163-caching.001.patch, HDFS-5163-caching.002.patch > > > some minor fixes-- see below. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-5169) hdfs.c: translateZCRException: null pointer deref when translating some exceptions
[ https://issues.apache.org/jira/browse/HDFS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-5169. Resolution: Fixed Fix Version/s: HDFS-4949 > hdfs.c: translateZCRException: null pointer deref when translating some > exceptions > -- > > Key: HDFS-5169 > URL: https://issues.apache.org/jira/browse/HDFS-5169 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: HDFS-4949 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: HDFS-4949 > > Attachments: HDFS-5169-caching.001.patch > > > hdfs.c: translateZCRException: there is a null pointer deref when translating > some exceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5169) hdfs.c: translateZCRException: null pointer deref when translating some exceptions
[ https://issues.apache.org/jira/browse/HDFS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760545#comment-13760545 ] Andrew Wang commented on HDFS-5169: --- +1, thanks colin. > hdfs.c: translateZCRException: null pointer deref when translating some > exceptions > -- > > Key: HDFS-5169 > URL: https://issues.apache.org/jira/browse/HDFS-5169 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: HDFS-4949 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-5169-caching.001.patch > > > hdfs.c: translateZCRException: there is a null pointer deref when translating > some exceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4879) Add "blocked ArrayList" collection to avoid CMS full GCs
[ https://issues.apache.org/jira/browse/HDFS-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4879: --- Resolution: Fixed Fix Version/s: 2.3.0 Target Version/s: 2.3.0 (was: 3.0.0) Status: Resolved (was: Patch Available) committed to branch 2.3 > Add "blocked ArrayList" collection to avoid CMS full GCs > > > Key: HDFS-4879 > URL: https://issues.apache.org/jira/browse/HDFS-4879 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 2.3.0 > > Attachments: hdfs-4879.txt, hdfs-4879.txt, hdfs-4879.txt, > hdfs-4879.txt > > > We recently saw an issue where a large deletion was issued which caused 25M > blocks to be collected during {{deleteInternal}}. Currently, the list of > collected blocks is an ArrayList, meaning that we had to allocate a > contiguous 25M-entry array (~400MB). After a NN has been running for a long > amount of time, the old generation may become fragmented such that it's hard > to find a 400MB contiguous chunk of heap. > In general, we should try to design the NN such that the only large objects > are long-lived and created at startup time. We can improve this particular > case (and perhaps some others) by introducing a new List implementation which > is made of a linked list of arrays, each of which is size-limited (eg to 1MB). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5118) Provide testing support for DFSClient to drop RPC responses
[ https://issues.apache.org/jira/browse/HDFS-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5118: Fix Version/s: (was: 2.3.0) 2.1.1-beta Release Note: Used for testing when NameNode HA is enabled. Users can use a new configuration property "dfs.client.test.drop.namenode.response.number" to specify the number of responses that DFSClient will drop in each RPC call. This feature can help testing functionalities such as NameNode retry cache. > Provide testing support for DFSClient to drop RPC responses > --- > > Key: HDFS-5118 > URL: https://issues.apache.org/jira/browse/HDFS-5118 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.1.1-beta > > Attachments: HDFS-5118.000.patch, HDFS-5118.001.patch, > HDFS-5118.002.patch, HDFS-5118.003.patch, HDFS-5118.004.patch > > > We plan to add capability to DFSClient so that the client is able to > intentionally drop responses of NameNode RPC calls according to settings in > configuration. In this way we can do better system test for NameNode retry > cache, especially when NN failover happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4879) Add "blocked ArrayList" collection to avoid CMS full GCs
[ https://issues.apache.org/jira/browse/HDFS-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760529#comment-13760529 ] Hudson commented on HDFS-4879: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4380 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4380/]) HDFS-4879. Add BlockedArrayList collection to avoid CMS full GCs (Contributed by Todd Lipcon) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520667) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/ChunkedArrayList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestChunkedArrayList.java > Add "blocked ArrayList" collection to avoid CMS full GCs > > > Key: HDFS-4879 > URL: https://issues.apache.org/jira/browse/HDFS-4879 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 2.3.0 > > Attachments: hdfs-4879.txt, hdfs-4879.txt, hdfs-4879.txt, > hdfs-4879.txt > > > We recently saw an issue where a large deletion was issued which caused 25M > blocks to be collected during {{deleteInternal}}. Currently, the list of > collected blocks is an ArrayList, meaning that we had to allocate a > contiguous 25M-entry array (~400MB). After a NN has been running for a long > amount of time, the old generation may become fragmented such that it's hard > to find a 400MB contiguous chunk of heap. > In general, we should try to design the NN such that the only large objects > are long-lived and created at startup time. We can improve this particular > case (and perhaps some others) by introducing a new List implementation which > is made of a linked list of arrays, each of which is size-limited (eg to 1MB). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5118) Provide testing support for DFSClient to drop RPC responses
[ https://issues.apache.org/jira/browse/HDFS-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760494#comment-13760494 ] Jing Zhao commented on HDFS-5118: - Also committed to branch-2.1-beta. > Provide testing support for DFSClient to drop RPC responses > --- > > Key: HDFS-5118 > URL: https://issues.apache.org/jira/browse/HDFS-5118 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.1.1-beta > > Attachments: HDFS-5118.000.patch, HDFS-5118.001.patch, > HDFS-5118.002.patch, HDFS-5118.003.patch, HDFS-5118.004.patch > > > We plan to add capability to DFSClient so that the client is able to > intentionally drop responses of NameNode RPC calls according to settings in > configuration. In this way we can do better system test for NameNode retry > cache, especially when NN failover happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5029) Token operations should not block read operations
[ https://issues.apache.org/jira/browse/HDFS-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760491#comment-13760491 ] Kihwal Lee commented on HDFS-5029: -- Since they are all acquiring read lock, if there are two token operations going on against a same token, the order of operations and the order of edit logging can be different. > Token operations should not block read operations > - > > Key: HDFS-5029 > URL: https://issues.apache.org/jira/browse/HDFS-5029 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-5029.2.patch, HDFS-5029.branch-23.patch, > HDFS-5029.patch, HDFS-5029.patch > > > Token operations unnecessarily obtain the write lock on the namespace. Edits > for token operations are independent of edits for other namespace write > operations, and the edits have no ordering requirement with respect to > namespace changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5118) Provide testing support for DFSClient to drop RPC responses
[ https://issues.apache.org/jira/browse/HDFS-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760484#comment-13760484 ] Hudson commented on HDFS-5118: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4378 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4378/]) Move HDFS-5118 to 2.1.1-beta section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520650) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Provide testing support for DFSClient to drop RPC responses > --- > > Key: HDFS-5118 > URL: https://issues.apache.org/jira/browse/HDFS-5118 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.3.0 > > Attachments: HDFS-5118.000.patch, HDFS-5118.001.patch, > HDFS-5118.002.patch, HDFS-5118.003.patch, HDFS-5118.004.patch > > > We plan to add capability to DFSClient so that the client is able to > intentionally drop responses of NameNode RPC calls according to settings in > configuration. In this way we can do better system test for NameNode retry > cache, especially when NN failover happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HDFS-5169) hdfs.c: translateZCRException: null pointer deref when translating some exceptions
[ https://issues.apache.org/jira/browse/HDFS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-5169 started by Colin Patrick McCabe. > hdfs.c: translateZCRException: null pointer deref when translating some > exceptions > -- > > Key: HDFS-5169 > URL: https://issues.apache.org/jira/browse/HDFS-5169 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: HDFS-4949 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-5169-caching.001.patch > > > hdfs.c: translateZCRException: there is a null pointer deref when translating > some exceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5169) hdfs.c: translateZCRException: null pointer deref when translating some exceptions
[ https://issues.apache.org/jira/browse/HDFS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5169: --- Attachment: HDFS-5169-caching.001.patch > hdfs.c: translateZCRException: null pointer deref when translating some > exceptions > -- > > Key: HDFS-5169 > URL: https://issues.apache.org/jira/browse/HDFS-5169 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: HDFS-4949 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-5169-caching.001.patch > > > hdfs.c: translateZCRException: there is a null pointer deref when translating > some exceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache
[ https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760354#comment-13760354 ] Jing Zhao commented on HDFS-5167: - [~ozawa], you can move this jira to hadoop-common if necessary. No need to create a new jira. > Add metrics about the NameNode retry cache > -- > > Key: HDFS-5167 > URL: https://issues.apache.org/jira/browse/HDFS-5167 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Priority: Minor > Attachments: HDFS-5167.1.patch > > > It will be helpful to have metrics in NameNode about the retry cache, such as > the retry count etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache
[ https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760367#comment-13760367 ] Tsuyoshi OZAWA commented on HDFS-5167: -- [~jingzhao], OK, thanks. > Add metrics about the NameNode retry cache > -- > > Key: HDFS-5167 > URL: https://issues.apache.org/jira/browse/HDFS-5167 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Priority: Minor > Attachments: HDFS-5167.1.patch > > > It will be helpful to have metrics in NameNode about the retry cache, such as > the retry count etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache
[ https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760353#comment-13760353 ] Tsuyoshi OZAWA commented on HDFS-5167: -- [~sureshms], I see. NameNode has RPCMetrics, so your idea looks good to me. Should we create new jira on hadoop-common? > Add metrics about the NameNode retry cache > -- > > Key: HDFS-5167 > URL: https://issues.apache.org/jira/browse/HDFS-5167 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Priority: Minor > Attachments: HDFS-5167.1.patch > > > It will be helpful to have metrics in NameNode about the retry cache, such as > the retry count etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5118) Provide testing support for DFSClient to drop RPC responses
[ https://issues.apache.org/jira/browse/HDFS-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5118: Resolution: Fixed Fix Version/s: 2.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. > Provide testing support for DFSClient to drop RPC responses > --- > > Key: HDFS-5118 > URL: https://issues.apache.org/jira/browse/HDFS-5118 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.3.0 > > Attachments: HDFS-5118.000.patch, HDFS-5118.001.patch, > HDFS-5118.002.patch, HDFS-5118.003.patch, HDFS-5118.004.patch > > > We plan to add capability to DFSClient so that the client is able to > intentionally drop responses of NameNode RPC calls according to settings in > configuration. In this way we can do better system test for NameNode retry > cache, especially when NN failover happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross rack/node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760132#comment-13760132 ] Steve Loughran commented on HDFS-5168: -- Moved to HDFS issues > BlockPlacementPolicy does not work for cross rack/node group dependencies > - > > Key: HDFS-5168 > URL: https://issues.apache.org/jira/browse/HDFS-5168 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Nikola Vujic >Priority: Critical > > Block placement policies do not work for cross rack/node group dependencies. > In reality this is needed when compute servers and storage fall in two > independent fault domains, then both BlockPlacementPolicyDefault and > BlockPlacementPolicyWithNodeGroup are not able to provide proper block > placement. > Let's suppose that we have Hadoop cluster with one rack with two servers, and > we run 2 VMs per server. Node group topology for this cluster would be: > server1-vm1 -> /d1/r1/n1 > server1-vm2 -> /d1/r1/n1 > server2-vm1 -> /d1/r1/n2 > server2-vm2 -> /d1/r1/n2 > This is working fine as long as server and storage fall into the same fault > domain but if storage is in a different fault domain from the server, we will > not be able to handle that. For example, if storage of server1-vm1 is in the > same fault domain as storage of server2-vm1, then we must not place two > replicas on these two nodes although they are in different node groups. > Two possible approaches: > - One approach would be to define cross rack/node group dependencies and to > use them when excluding nodes from the search space. This looks as the > cleanest way to fix this as it requires minor changes in the > BlockPlacementPolicy classes. > - Other approach would be to allow nodes to fall in more than one node group. > When we chose a node to hold a replica we have to exclude from the search > space all nodes from the node groups where the chosen node belongs. This > approach may require major changes in the NetworkTopology. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (HDFS-5168) BlockPlacementPolicy does not work for cross rack/node group dependencies
[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran moved HADOOP-9936 to HDFS-5168: -- Key: HDFS-5168 (was: HADOOP-9936) Project: Hadoop HDFS (was: Hadoop Common) > BlockPlacementPolicy does not work for cross rack/node group dependencies > - > > Key: HDFS-5168 > URL: https://issues.apache.org/jira/browse/HDFS-5168 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Nikola Vujic >Priority: Critical > > Block placement policies do not work for cross rack/node group dependencies. > In reality this is needed when compute servers and storage fall in two > independent fault domains, then both BlockPlacementPolicyDefault and > BlockPlacementPolicyWithNodeGroup are not able to provide proper block > placement. > Let's suppose that we have Hadoop cluster with one rack with two servers, and > we run 2 VMs per server. Node group topology for this cluster would be: > server1-vm1 -> /d1/r1/n1 > server1-vm2 -> /d1/r1/n1 > server2-vm1 -> /d1/r1/n2 > server2-vm2 -> /d1/r1/n2 > This is working fine as long as server and storage fall into the same fault > domain but if storage is in a different fault domain from the server, we will > not be able to handle that. For example, if storage of server1-vm1 is in the > same fault domain as storage of server2-vm1, then we must not place two > replicas on these two nodes although they are in different node groups. > Two possible approaches: > - One approach would be to define cross rack/node group dependencies and to > use them when excluding nodes from the search space. This looks as the > cleanest way to fix this as it requires minor changes in the > BlockPlacementPolicy classes. > - Other approach would be to allow nodes to fall in more than one node group. > When we chose a node to hold a replica we have to exclude from the search > space all nodes from the node groups where the chosen node belongs. This > approach may require major changes in the NetworkTopology. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5164) deleteSnapshot should check if OperationCategory.WRITE is possible before taking write lock
[ https://issues.apache.org/jira/browse/HDFS-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760122#comment-13760122 ] Hudson commented on HDFS-5164: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #324 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/324/]) HDFS-5164. DeleteSnapshot should check if OperationCategory.WRITE is possible before taking write lock (contributed by Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520492) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > deleteSnapshot should check if OperationCategory.WRITE is possible before > taking write lock > --- > > Key: HDFS-5164 > URL: https://issues.apache.org/jira/browse/HDFS-5164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-5164.001.patch > > > deleteSnapshot should check if OperationCategory.WRITE is possible before > taking the write lock, to help avoid lock contention -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5159) Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760120#comment-13760120 ] Hudson commented on HDFS-5159: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #324 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/324/]) HDFS-5159. Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint. Contributed by Aaron T. Myers. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520363) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java > Secondary NameNode fails to checkpoint if error occurs downloading edits on > first checkpoint > > > Key: HDFS-5159 > URL: https://issues.apache.org/jira/browse/HDFS-5159 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.1.0-beta >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 2.1.1-beta > > Attachments: HDFS-5159.patch, HDFS-5159.patch > > > The 2NN will avoid downloading/loading a new fsimage if its local copy of > fsimage is the same as the version on the NN. However, the decision to *load* > the fsimage from disk into memory is based only on the on-disk fsimage > version. If an error occurs between downloading and loading the fsimage on > the first checkpoint attempt, the 2NN will never load the fsimage, and then > on subsequent checkpoint attempts it will not load the on-disk fsimage and > thus will never checkpoint successfully. > Example error message in the first comment of this ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4491) Parallel testing HDFS
[ https://issues.apache.org/jira/browse/HDFS-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760123#comment-13760123 ] Hudson commented on HDFS-4491: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #324 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/324/]) HDFS-4491. Add/delete files missed in prior commit. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520482) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLConnectionFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLUtils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/test/PathUtils.java HDFS-4491. Parallel testing HDFS. Contributed by Andrey Klochkov. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520479) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HsftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/fs/TestFiRename.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestResolveHdfsSymlink.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestUrlStreamHandler.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/loadGenerator/TestLoadGenerator.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientReportBadBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFSInputChecker.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppendRestart.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSServerPorts.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpURLTimeouts.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestMiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPersistBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestNNWithQJM.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournalNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestOverReplicatedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAllowFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestClusterId.java * /hadoop/c