[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396311#comment-13396311 ] Jesse Yates commented on HDFS-3370: --- bq. Hardlinks are of similar nature. They are hard to support if the namespace is distributed. FWIW Ceph also punts on distributed hardlinks and just puts them into a single node "because they are not commonly used and not likely to be hot or large" (paraphrasing). Conceptually, you could do it with 2PC across nodes, which should be fine as long as the namespace isn't sharded too highly - +1000s of nodes hosting hardlink information (again, not too many hardlinks). >From an HBase perspective, hardlink count _could_ become large (~equal number >of hfiles), but that isn't going to be near the number of files overall >currently in HDFS. Maybe punt on the issue until it becomes a problem, keeping >it flexible behind an interface? > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396961#comment-13396961 ] Jesse Yates commented on HDFS-3370: --- Maybe I'm missing something here... bq. Backup itself only becomes safe if HDFS (not HBase) promises to never modify a file once it is closed. Otherwise, a process that accidentally writes into the hard-linked file will corrupt "both" copies At least for the HBase case, if we set the file permissions to be 744, you will only have an hbase process that could mess up the file (which it won't do once we close the file) and then an errant process can only slow down other reader processes. That would make it sufficient at least for HBase backups, but clearly not for general HDFS backups. > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402620#comment-13402620 ] Jesse Yates commented on HDFS-3370: --- I'd like to propose an alternative to 'real' hardlinkes: "reference counted soft-Links", or all the hardness you really need in a distributed FS. In this implementation of "hard" links, I would propose that wherever the file is created is considered the "owner" of that file. Initially, when created, the file has a reference count of (1) on the local namespace. If you want another hardlink to the file in the same namespace, you then talk to the NN and request another handle to that file, which implicitly updates the references to the file. The reference to that file could be stored in memory (and journaled) or written as part of the file metadata (more on that later, but lets ignore that for the moment). Suppose instead that you are in a separate namespace and want a hardlink to the file in the original namespace. Then you would make a request to your NN (NNa) for a hardlink. Since NNa doesn't own the file you want to reference, it makes a hardlink request to NN which originally created the file, the file 'owner' (or NNb). NNb then says 'Cool, I've got your request and increment the ref-count for the file." NNa can then grant your request and give you a link to that file. The failure case here is either 1) NNb goes down, in which case you can just keep around the reference requests and batch them when NNb comes back up. 2) NNa goes down mid-request - if NNa doesn't recieve an ACK back for the granted request, it can then disregard that request and re-decrement the count for that hardlink. Deleting the hardlink then follows a similar process. You issue a request to the owner NN, either directly from the client if you are deleting a link in the current namespace or through a proxy NN to the original namenode. It then decrements the reference count on the file and allows the deletion of the link. If the reference count ever hits 0, then the NN also deletes the file since there are no valid references to that file. This has the implicit implication though that the file will not be visible in the namespace that created it if all the hardlinks to it are removed. This means it essentially becomes a 'hidden' inode. We could, in the future, also work out a mechanism to transfer the hidden inode to a NN that has valid references to it (maybe via a gossip-style protocol), but that would be out of the current scope. There are some implications for this model. If the owner NN manages the ref-count in memory, if that NN goes down, its whole namespace then becomes inaccessible, including _creating new hardlinks_ to any of the files (inodes) that it owns. However, the owner NN going down doesn't preclude the other NN from serving the file from their own 'soft' inodes. Alternatively, the NN could have a lock on the a hardlinked file, with the ref-counts and ownership info in the file metadata. This might introduce some overhead when creating new hardlinks (you need to reopen and modify the block or write a new block with the new information periodically - this latter actually opens a route to do ref-count management via appends to a file-ref file), but has the added advantage that if the owner NN crashed, an alternative NN could some and claim ownership of that file. This is similar to doing Paxos style leader-election for a given hardlinked file combined with leader-leases. However, this very unlikely to see lots of fluctuation as the leader can just reclaim the leader token via appends to the file-owner file, with periodic rewrites to minimize file size. The on-disk representation of the extreme version I'm proposing is then this: the full file then is actually composed of three pieces: (1) the actual data and then two metadata files, "extents" (to add a new word/definition), (2) an external-reference extent: each time a reference is made to the file a new count is appended and it can periodically recompacted to a single value, (3) an owner-extent with the current NN owner and the lease time on the file, dictating who controls overall deletion of the file (since ref counts are done via the external-ref file). This means (2) and (3) are hidden inodes, only accessible to the namenode. We can minimize overhead to these file extents by ensuring a single writer via messaging to the the owner NN (as specified by the owner-file), though this is not strictly necessary. Further, (1) could become a hidden inode if all the local namespace references are removed, but it could eventually be transferred over to another NN shard (namespace) to keep overhead at a minimum, though (again), this is not a strict necessity. The design retains the NN view of files as directory entries, just entries with a little bit of metadata. The
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405654#comment-13405654 ] Jesse Yates commented on HDFS-3370: --- Sorry for the slow reply, been a bit busy of late... @Daryn bq. Retaining ref-counted paths after deletion in the origin namespace requires an "inode id". A new api to reference paths based on the id is required. We aren't so soft anymore... That's why I'd argue for doing it in file meta-data with periodic rewrites so we can just do appends. We will still need to maintain references if we do hardlinks, so this is just a single method call to do the update - arguably a pretty simple code path that doesn't need to be that highly optimized for multi-writers since we can argue that hardlinks are "rare". bq. The inode id needs to be secured since it bypasses all parent dir permissions, Yeah, thats a bit of a pain... Maybe a bit more metadate to store with the file...? @Konstantin bq. Do I understand correctly that your hidden inodes can be regular HDFS files, and that then the whole implementation can be done on top of existing HDFS, as a stand alone library supporting calls Yeah, I guess that's a possibility. But you would probably need to have some sort of "namespace managers" to deal with handling hardlinks across different namespaces, which fits comfortably with the distributed namenode design. bq. ref-counted links, creating hidden "only accessible to the namenode" inodes, leases on arbitrated NN ownership, retention of deleted files with non-zero ref count, etc. Those aren't client-side operations. Since you keep the data along with the file (including the current file owner), you could do it all from a library. However, since the lease needs to be periodically regained, you will see temporary unavailability in the hardlinked files in the managed namespace. If you couple the hardlink management with the namenode managing the space, you can then do forced-resassign of the hardlinks to the back-up namendoe and still see the availability as you would files in that namespace, in terms of creating new hardlinks (reads would still work since all the important data can be replicated across the different namespaces). @Andy: I don't know if I've seen a compelling reason that we _need_ to have cross-namespace hardlinks, particularly since they are _hard_, to say the least. > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405655#comment-13405655 ] Jesse Yates commented on HDFS-3370: --- Another, simpler way to do hardlinks with cross-server coordination (which in reality needs something like Paxos, or suffer some more unavailability to ensure consistency) would be to leverage ZooKeeper. Yes, -1 for another piece of infrastructure from this, but if does provide all the cross-namespace transactionality we need and make reference counting and security management significantly easier. Not quite client-library easy, but pretty darn close :) > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453667#comment-13453667 ] Jesse Yates commented on HDFS-3370: --- @Jaganar - with HBASE-6055 (currently in review) you get a flush (more or less coordinated between regionservers - see the jira for more info) of the memstore to HFiles, which we would then _love_ to hardlink into the snapshot directory. HFiles live under the the region directory - which lives under the column family and table directories - where the HFile is being served. When a comapction occurs, the file is moved to the .archive directory. Currently, we are getting around the hardlink issue by referencing the HFiles by name and then using a FileLink (also in review) to deal with the file getting archived out from under us when we restore the table. The current implementation of snapshots in HBase is pretty close to what you are proposing (and almost identical for 'globally consistent' - cross-server consistent- snapshots, but those quiesce for far too long to ensure consistency), but spends minimal time blocking. In short, hardlinks make snapshotting easier, but we still need both parts to get 'clean' restores. Otherwise, we need to do a WAL replay from the COW version of the WAL to get back in-memory state. Does that make sense/answer your question? > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454170#comment-13454170 ] Jesse Yates commented on HDFS-3370: --- @Jagane - in short, yes. With the PIT split, any writes up to that point will go into the snapshot. Obviously, we can't ensure that future writes beyond the taking of the snapshot end up in the snapshot. Some writes can get dropped between snapshots though if you don't have your TTLs set correctly, since a compaction can age-off the writes before the snapshot can be taken. This is part of an overall backup solution, and not really the concern of the mechanism for taking snapshots - that's up to you :) Feel free to DM me if you want to chat more. > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
[ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140148#comment-15140148 ] Jesse Yates commented on HDFS-9787: --- I think I'll have some time tomorrow - adding it to my list > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to > false. > --- > > Key: HDFS-9787 > URL: https://issues.apache.org/jira/browse/HDFS-9787 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 3.0.0 >Reporter: Guocui Mi >Assignee: Guocui Mi > > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. > Here is the logic to check if upload FSImage or not. > In StandbyCheckpointer.java > boolean sendRequest = isPrimaryCheckPointer || secsSinceLastUpload >= > checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > The sendRequest is always false if isPrimaryCheckPointer is false giving > secsSinceLast (~checkpointPeriod) >= checkpointConf.getQuietPeriod() > (checkpointPeriod * this.quietMultiplier(default value 1.5)) always returns > false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
[ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140175#comment-15140175 ] Jesse Yates commented on HDFS-9787: --- Taking a quick look, this would imply that the non-primary SNN never sends a checkpoint after the first time? A good test to ensure that this is the case is to start the NNs, wait until there primary SNN is selected and then remove it from the cluster. Are any more checkpoints sent to the ANN? My inclination is that you are correct, no (unless it takes a long time to build the checkpoint), but I'd like to hear if that's actually the case. I think the fix is to just set lastCheckpointTime in doCheckpoint() rather than after each loop iteration. > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to > false. > --- > > Key: HDFS-9787 > URL: https://issues.apache.org/jira/browse/HDFS-9787 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 3.0.0 >Reporter: Guocui Mi >Assignee: Guocui Mi > Attachments: HDFS-9786-v000.patch > > > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. > Here is the logic to check if upload FSImage or not. > In StandbyCheckpointer.java > boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= > checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > The sendRequest is always false if isPrimaryCheckPointer is false giving > secsSinceLast (~checkpointPeriod) >= checkpointConf.getQuietPeriod() > (checkpointPeriod * this.quietMultiplier(default value 1.5)) always returns > false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
[ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140176#comment-15140176 ] Jesse Yates commented on HDFS-9787: --- I don't think you need a new variable - lastCheckpointTime is never used outside of the class and only used for for this check > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to > false. > --- > > Key: HDFS-9787 > URL: https://issues.apache.org/jira/browse/HDFS-9787 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 3.0.0 >Reporter: Guocui Mi >Assignee: Guocui Mi > Attachments: HDFS-9786-v000.patch > > > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. > Here is the logic to check if upload FSImage or not. > In StandbyCheckpointer.java > boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= > checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > The sendRequest is always false if isPrimaryCheckPointer is false giving > secsSinceLast (~checkpointPeriod) >= checkpointConf.getQuietPeriod() > (checkpointPeriod * this.quietMultiplier(default value 1.5)) always returns > false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
[ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140384#comment-15140384 ] Jesse Yates commented on HDFS-9787: --- The original solution was an attempting to catch the case where we don't flood the NN with checkpoint requests. Instead, maybe the better solution would be to do a small RPC to see when the latest image was uploaded. If it was uploaded the quietMultiplier beyond the checkpoint period, then we attempt to upload the checkpoint. Its a bit more work, but I think this more clearly lays out the intentions in the code, rather than obtaining the same effect, but without the overhead of actually sending the checkpoint along each time we want to find out if its behind. > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to > false. > --- > > Key: HDFS-9787 > URL: https://issues.apache.org/jira/browse/HDFS-9787 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 3.0.0 >Reporter: Guocui Mi >Assignee: Guocui Mi > Attachments: HDFS-9786-v000.patch > > > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. > Here is the logic to check if upload FSImage or not. > In StandbyCheckpointer.java > boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= > checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > The sendRequest is always false if isPrimaryCheckPointer is false giving > secsSinceLast (~checkpointPeriod) >= checkpointConf.getQuietPeriod() > (checkpointPeriod * this.quietMultiplier(default value 1.5)) always returns > false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9787) SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to false.
[ https://issues.apache.org/jira/browse/HDFS-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149389#comment-15149389 ] Jesse Yates commented on HDFS-9787: --- Yeah, that's fine. As long as the committer is happy, I'm happy. > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer changed to > false. > --- > > Key: HDFS-9787 > URL: https://issues.apache.org/jira/browse/HDFS-9787 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 3.0.0 >Reporter: Guocui Mi >Assignee: Guocui Mi > Attachments: HDFS-9787-v001.patch, HDFS-9787-v002.patch, > HDFS-9787-v003.patch, HDFS-9787-v004.patch > > > SNNs stop uploading FSImage to ANN once isPrimaryCheckPointer become false. > Here is the logic to check if upload FSImage or not. > In StandbyCheckpointer.java > boolean sendRequest = isPrimaryCheckPointer || secsSinceLast >= > checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > The sendRequest is always false if isPrimaryCheckPointer is false giving > secsSinceLast (~checkpointPeriod) >= checkpointConf.getQuietPeriod() > (checkpointPeriod * this.quietMultiplier(default value 1.5)) always returns > false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592925#comment-15592925 ] Jesse Yates commented on HDFS-6440: --- Upgrades/downgrades between major versions isn't supported AFAIK. Those seem like the major 2 places for upgrade issues. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0-alpha1 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531238#comment-14531238 ] Jesse Yates commented on HDFS-6440: --- [~atm] thanks for the feedback. I'm working on rebasing on trunk and addressing your comments (hopefully a patch by tomorrow), but a couple of comments/questions first: bq. Rolling upgrades/downgrades/rollbacks. I'm not sure how we would test this when needing to change the structure of the FS to support more than 2 NNs. Would you recommend (1) recognizing the old layout and then (2) transfering it into the new layout? The reason this seems silly (to me) is that the layout is only enforced by the way the minicluster is used/setup, rather than the way things would actually be run. By moving things into the appropriate directories per-nn, but keeping everything else below that the same, I think we keep the same upgrade properties but don't need to do the above contrived/synthetic "upgrade". bq. What's a "fresh cluster" vs. a "running cluster" in this sense? Maybe some salesforce terminology leak here. "Fresh" would be one where you just formatted the primary NN and are bootstrapping the other NNs from that layout. "Running" would be when bringing up a SNN after some sort of failure and it has an unformatted fs - then it can pull from any node in the cluster. As an SNN it would then be able to catch up by tailing the ANN. I'll update the comment. bq. is changing the value of FAILOVER_SEED going to do anything, given that it's only ever read at the static initialization of the failoverRandom? Yes, it for when there is an error and you want to run the exact sequence of failovers again in the test. Minor helper, but can be useful when trying to track down ordering dependency issues (which there shoudn't be, but sometimes these things can creep in). Otherwise, everything else seems completely reasonable. Thanks! > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531297#comment-14531297 ] Jesse Yates commented on HDFS-6440: --- More comments, as I actually get back into the code: {quote} In StandbyCheckpointer#doCheckpoint, unless I'm missing something, I don't think the variable "ie" can ever be non-null, and yet we check for whether or not it's null later in the method to determine if we should shut down. {quote} It can either be an InterruptedException or an IOException when transfering the checkpoint. Interrupted ("ie") thrown if we are interrupted while waiting the any checkpoint to complete. IOE if there is an execution exception when doing the checkpoint. After we get out of waiting for the uploads, if we got an "ioe" or an "ie" then we force the rest of the threads that we started for the image transfer to quit by shutting down the threadpool (and then forcibly shutting it down shortly after that). We do checks again for each exception to ensure we throw the right one back up. We could wrap the exceptions into a parent exception and then just throw that back up to the caller (resulting in less checks), but I didn't want to change the method signature b/c the interrupted means something very different from ioe. Can do whatever you want there though, don't really matter to me. We need to make sure either exception is rethrown > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531745#comment-14531745 ] Jesse Yates commented on HDFS-6440: --- And finally, after working through the comments... {quote} The changes to BlockTokenSecretManager - they look fine to me in general, but I'd love to see some extra tests of this functionality with several NNs in play. Unless I missed something, I don't think there are any tests that would exercise more than 2 {{BlockTokenSecretManager}}s {quote} There is {{TestFailoverWithBlockTokensEnabled}} which does ensure that multiple {{BlockTokenSecretManager}}s don't have overlapping ranges, among other standard blocktoken things - its modified to run with 3NNs. Looking at the other references to the {{BlockTokenSecretManager}} in tests, it doesn't seem to be anywhere else we care about testing when there are multiple NN, just that that the basic range functionality works (which is the main thing that is being modified). Happy to add more, just not sure what exactly you want there. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v3.patch Attaching patch updated on trunk + [~atm]'s comments (less ones that didn't seem to apply). Haven't run local tests since changes seemed innocuous... hoping that HadoopQA bot can handle this on it own. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Fix Version/s: 3.0.0 Status: Patch Available (was: Open) > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538749#comment-14538749 ] Jesse Yates commented on HDFS-6440: --- {quote} Right, I get that, but what I was pointing out was just that in the previous version of the patch the variable "ie" was never being assigned to anything but "null". {quote} Oh, yeah. That was a problem. Sorry for the misunderstanding! bq. I'm specifically thinking about just expanding TestRollingUpgrade with some tests that exercise the > 2 NN scenario, e.g. Yea, I'll look into that - look for it in the next patch. Shouldn't be too hard (and might be cleaner codewise!) {quote} I get the point of using the random seed in the first place, but I'm specifically talking about the fact that in doWriteOverFailoverTest we change the value of that variable, log the value, and then never read it again. {quote} Well, we use it again through the random variable which will determine the ID of the NN to become the ANN. {code} int nextActive = failoverRandom.nextInt(NN_COUNT); {code} By setting the seed, you get the same sequence nn failures. So one seed would do 1->2->1->3, while another might do 1->3->2->1. Then, with the seed you could reproduce the series of failovers in the same order, which seems like a laudable goal for the test- especially when trying to debug weird error cases. Unless I'm missing something? > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544623#comment-14544623 ] Jesse Yates commented on HDFS-6440: --- Ah, Ok. Yes, that second set seed will clearly not be used and is definitely be misleading. Sorry for being dense :-/ I was just looking at the usage of the Random, not the seed! I'm thinking to just pull the better log message up to the static initialization and remove the those two lines (4-5). I _think_ the original idea was to make it easier to reproduce an individual test failures, since each cluster in the methods is managed independently... but I don't know if it really matters at this point; it just sucks to have to rerun all the tests to debug the single test. Thoughts? > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v4.patch Attaching updated patch. Working through some local test failures - seem like they might be just due to rebase changes? Looking into it. Changes of note: * fixing concurrent checkpoint management - was breaking TestRollingUpgrade - to not keep around completed checkpoints * Adding tests to TestRollingUpgrade Removing random seed setting in testpipelinesfailover * Fixing startup option setting in minidfscluster#restartNode * Fixing block manager to use correct nnid lookup FYI, on vacation through memorial day, so not going to be doing much for the next few days. Back on Tuesday. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v5.patch Attaching updated patch, rebased on latest trunk. My usual covering suite of mNN tests* passed locally a few times. Notable changes: - Moving checkpoint lock inside actually needing to take the checkpoint (not functional change, just a locking improvement) - Cleanup determining when to send checkpoints, so we only calculate if we should send it when we know that the checkpoint will actually be created. {code} *mvn clean test -Dtest=TestPipelinesFailover,TestRollingUpgrade,TestZKFailoverController,TestBookKeeperHACheckpoints,TestBlockToken,TestBackupNode,TestCheckpoint,TestDFSUpgradeFromImage,TestBootstrapStandby,TestBootstrapStandbyWithQJM,TestEditLogTailer,TestFailoverWithBlockTokens,TestHAConfiguration,TestRemoteNameNodeInfo,TestSeveralNameNodes,TestStandbyCheckpoints,TestDNFencingWithReplication {code} > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v6.patch New version, hopefully fixing the findbugs/checkstyle issues and increasing the TestPipelinesFailover timeout to get it to pass. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v7.patch Ok, looks like didn't fix whitespace like I thought :-/ However, manually fixed up checkstyle/whitespace issues. Also, slight improvement in TestPipelinesFailover to abstract cluster creation b/c rebase failed to update all relevant tests to run 3 NNs, causing periodic test failures. Now passing every time locally. Hopefully, this should get the greenlight from QA :) > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565137#comment-14565137 ] Jesse Yates commented on HDFS-6440: --- Failed tests pass locally. Missed a whitespace in TestPipelinesFailover :( Could fix on commit, unless there are other comments on the latest version, in which case I'll wrap that into a new revision. Otherwise, i'd say this is go to go, [~atm]? > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592812#comment-14592812 ] Jesse Yates commented on HDFS-6440: --- I ran the test (independently) a couple of times locally after rebasing on latest trunk (as of 3hrs ago - YARN-3802) and didn't see any failures. However, when running a bigger battery of tests, my "multi-nn suite", I got the following failure: {quote} testUpgradeFromRel1BBWImage(org.apache.hadoop.hdfs.TestDFSUpgradeFromImage) Time elapsed: 11.115 sec <<< ERROR! java.io.IOException: Cannot obtain block length for LocatedBlock{BP-362680364-127.0.0.1-1434673340215:blk_7162739548153522810_1020; getBlockSize()=1024; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:59215,DS-8d6d81c3-5027-4fbf-a7c8-a8be86cb7e00,DISK]]} at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:394) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:336) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1184) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1168) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1154) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:174) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:210) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:225) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:597) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:619) {quote} ...but only sometimes. Is this at all what you guys are seeing too? btw, I'm running OSX - maybe its a linux issue? I'm gonna re-submit (+ fix for whitespace) and see how jenkins likes it. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Status: Open (was: Patch Available) > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Status: Patch Available (was: Open) > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v8.patch Attaching updated patch w/ whitespace fix. Lets see what QA thinks of the upgrade test. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592823#comment-14592823 ] Jesse Yates commented on HDFS-6440: --- Looks like maybe the binary changes from the tarball image aren't getting applied? That's all that I can think, since you fellas aren't seeing the cluster even start up. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592826#comment-14592826 ] Jesse Yates commented on HDFS-6440: --- Just went back to trunk and applied the patch directly (rather than using my branch) and test passed again w/o issue ($ mvn install -DskipTests; mvn clean test -Dtest=TestDFSUpgradeFromImage) > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597016#comment-14597016 ] Jesse Yates commented on HDFS-6440: --- Rebased on trunk, tests pass locally for me. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598698#comment-14598698 ] Jesse Yates commented on HDFS-6440: --- Great, thanks [~atm]! Just filed HDFS-8657 > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8657) Update docs for mSNN
Jesse Yates created HDFS-8657: - Summary: Update docs for mSNN Key: HDFS-8657 URL: https://issues.apache.org/jira/browse/HDFS-8657 Project: Hadoop HDFS Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Fix For: 3.0.0 After the commit of HDFS-6440, some docs need to be updated to reflect the new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8657) Update docs for mSNN
[ https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-8657: -- Attachment: hdfs-8657-v0.patch Patch for updating HDFSHighAvailabilityWithQJM.md. No major changes except updating the example to use 3NNs in the configs, rather than two and some nits to indicate you can use 2+ in HA. > Update docs for mSNN > > > Key: HDFS-8657 > URL: https://issues.apache.org/jira/browse/HDFS-8657 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jesse Yates >Assignee: Jesse Yates >Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-8657-v0.patch > > > After the commit of HDFS-6440, some docs need to be updated to reflect the > new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598736#comment-14598736 ] Jesse Yates commented on HDFS-6440: --- Yeah, that failure looks wildly unrelated. Someone messing about with the poms? > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8657) Update docs for mSNN
[ https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-8657: -- Attachment: hdfs-8657-v1.patch Updating patch for HDFSHighAvailabilityWithNFS. Thanks for the catch [~atm]. > Update docs for mSNN > > > Key: HDFS-8657 > URL: https://issues.apache.org/jira/browse/HDFS-8657 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jesse Yates >Assignee: Jesse Yates >Priority: Minor > Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch > > > After the commit of HDFS-6440, some docs need to be updated to reflect the > new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8657) Update docs for mSNN
[ https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-8657: -- Status: Patch Available (was: Open) > Update docs for mSNN > > > Key: HDFS-8657 > URL: https://issues.apache.org/jira/browse/HDFS-8657 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jesse Yates >Assignee: Jesse Yates >Priority: Minor > Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch > > > After the commit of HDFS-6440, some docs need to be updated to reflect the > new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8657) Update docs for mSNN
[ https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-8657: -- Status: Open (was: Patch Available) > Update docs for mSNN > > > Key: HDFS-8657 > URL: https://issues.apache.org/jira/browse/HDFS-8657 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jesse Yates >Assignee: Jesse Yates >Priority: Minor > Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch > > > After the commit of HDFS-6440, some docs need to be updated to reflect the > new support for more than 2 NNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: Multiple-Standby-NameNodes_V1.pdf hdfs-multiple-snn-trunk-v0.patch Attaching a patch on top of trunk (at least as of a couple weeks ago). Also, attaching a design doc as a guide for anyone who wants to take on reviewing this one :) FWIW, we are running this patch in production at Salesforce(1), added additional unit tests that pass alongside the original unit tests, and did an extensive load testing under adverse conditions via m/r (see design doc). (1) well, on top of the latest CDH release :) > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188485#comment-14188485 ] Jesse Yates commented on HDFS-6440: --- In the introduction of the design doc, the second paragraph says: {quote} the expectation is that any two nodes can fail, except for the NameNode; this availability expectation is true across many deployments - you run at least 3 ZooKeepers, 3 HMasters, and 3 copies of each block on DataNodes. {quote} This should read: {quote} the expectation is that any two nodes can fail, except for the NameNode; this availability expectation is true across many deployments - you run at least *5 ZooKeepers*, *5 Quorum Journal Managers*, 3 HMasters, and 3 copies of each block on DataNodes. {quote} to correct the oversight that if two ZKs or QJMs go down, you will still have a quorum of nodes. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238299#comment-14238299 ] Jesse Yates commented on HDFS-6440: --- So, what can I do to help push this along? I'm happy to come talk with folks in person (feel free to PM me) or do short PPTs. I also want to point out that this has been running, in production, at Salesforce for some time now. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240452#comment-14240452 ] Jesse Yates commented on HDFS-6440: --- bq. What is the procedure for adding or replacing NNs? Not explicitly more easily than currently supported. The problem is that all the nodes currently have the NNs hard-coded in config. What you could do is roll the NNs with the new NN config. Then roll the rest of the clients with the new config as well, once the new NN is to date. I don't know if you would even do anything different than currently configured. bq. Could it support dynamically adding NNs without downtime? Not really. You would have to push the downtime question up a level, and rely on something like ZK to maintain the list of NNs (on the simple approach). It reduces down to a group membership problem. bq. Would it be possible to avoid multiple SNNs to upload fsimages with trivial deltas in a short time Sure. This was the idea behind adding the 'primary checkpointer' logic - if you are not the primary, then you backoff for 2x the usual wait period, because you assume the primary is up and doing edits, but check again every so often to make sure it hasn't gotten too far behind. Obviously there is a possibility for who is the 'primary checkpointer' to ping-pong back and forth between SNNs, but generally it would be one that gets the lead and keeps it. bq. Would it be possible that this behavior makes other SNNs miss the edit logs? Its possible, but that's a somewhat rare occurrence as you can generally bring the NN back up fairly quickly. If its really far behind, you can then bootstrap up to the current NNs state and run it from there. In practice, we haven't seen any problems with this. bq. Does this work support rolling upgrade? I'm not aware that it would change it. bq. Would it makes client failover more complicated? Now instead of two servers, it can fail over between N. I believe the client code currently supports this as-is. bq. What would be the impact on the DN side? Basically, just in block reports to more than 2 NNs. This can start to cause some bandwidth congestion at some point, but I don't think it would be a problem with up to at least 5 or 7 nodes. bq. What are the changes on the test resources files (hadoop-*-reserved.tgz) ? The mini-cluster is designed for supporting only two NNs, down to the files it writes to maintain the directly layout. Unfortunately, it doesn't manage the directories in any easily updated way, so I had to rip the existing directory structure it uses and replace it with something a little more flexible. The changes to the zip files is just to support this updated structure for the mini-cluster. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243633#comment-14243633 ] Jesse Yates commented on HDFS-6440: --- bq. Does this mean that there might be multiple SNNs marking themselves as 'primary checkpointer' during the same time period, since it is determined by SNN itself Yes, that is a possibility, which I was getting at with my comment about the primary checkpointer "ping-ponging". The images would have small deltas, but the ANN would be kept up to date. As the updates slow down, one of the checkpointers would eventually win. However, either (a) we haven't seen this show up on any of our clusters or (b) have never noticed any service issues because of it. bq. Would it be reasonable to also let ANN to reject fsimage upload request? Sure, its possible. My concern was around ensuring that the ANN had to most up to date checkpoint and let the SNNs sort themselves out. It seems a bit more intrusive in the code since you also need to differentiate the source - you don't want to reject an update from the primary checkpointer if it occurs just because of the time elapsed. I'd say worth looking into in a follow up jira though - this is already a pretty large change. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248674#comment-14248674 ] Jesse Yates commented on HDFS-6440: --- Would you prefer doing this over a pull request/RB? Might be easier to point out specific elements. If not, happy to respond here. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249207#comment-14249207 ] Jesse Yates commented on HDFS-6440: --- I'll post the updated patch somewhere, if you like. However, for the meantime, responses! I think some stuff got a little messed up with the trunk port... these are all great catches! bq. I guess the default value of isPrimaryCheckPointer might be a typo, which should be false. Yup and bq. is there a case that SNN switches from primary check pointer to non-primary check pointer Not that I can find either :) Should be that we track success in the transfer result from the upload and then update the primary checkpoint status based on the success therein (so if no upload is valid, no longer the primary). bq. 2. Is the following condition correct? I think only sendRequest is needed. Kinda. I think it should actually be: {code} if (needCheckpoint) { doCheckpoint(sendRequest); {code} and then make and save the checkpoint, but only send it if we need to (sendRequest == true). bq. If it is the case, are these duplicated conditions? The quiet period should be larger than the usual checking period (multiplier is 1.5), so its the separation of the sending the request vs. taking the checkpoint that comes into conflict here. I think this logic makes more sense with the above change for separating the use of needCheckpoint and sendCheckpoint. bq. might be easier to let ANN calculate the above conditions... It could be a nice optimization later. Definitely! Was trying to keep the change footprint down. bq. When it uploads fsimage, are SC_CONFLICT and SC_EXPECTATION_FAILED not handled in the SNN in the current patch They somewhat are - they don't throw an exception back out, but are marked as 'failures'. Either way, in the new version of the patch (coming), in keeping with the changes for setting isPrimaryCheckpointer described above, the primaryCheckpointStatus is set to the correct value. Either, it got a NOT_ACTIVE_NAMENODE_FAILURE on the other SNN or it tried to upload an old transaction to the ANN (OLD_TRANSACTION_ID_FAILURE). If its the first, the other NN could succeed (making this pSNN) or its an older transaction, so it shouldn't be the pSNN. With the caveat you mentioned in your last comment about both SNN thinking they are pSNN. bq. Could you set EditLogTailer#maxRetries to private final? That wasn't part of my change set - the code was already there. It looks like that its used to set the edit log in testing. bq. Do we need to enforce an acceptable value range for maxRetries An interesting idea! I didn't want to spin forever there and instead surface the issue to the user by bringing down the NN. My question back is, is there another process that will bring down the NN if it cannot reach the other NNs? Otherwise, it can get hopelessly out of date and look like a valid standby when it really isn't. bq. NN when nextNN = nns.size() - 1 and maxRetries = 1 Oh, yeah - that's a problem, regardless of the above. Pending patch should fix that. Coming patch should also fix the remainder of the formatting issues. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v1.patch updated version of the patch as per excellent review comments (thanks [~eddyxu]!). It will probably need another rebase before it goes in as well, but for the moment I wanted to minimize the deltas until everyone is happy. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275579#comment-14275579 ] Jesse Yates commented on HDFS-6440: --- thanks for the comments. I'll work on a new version, but in the meantime, some responses: bq. StandbyCheckpointer#activeNNAddresses The standby checkpointer doesn't necessarily run just on the SNN - it could be in multiple places. Further, I think you are presupposing that there is only one SNN and one ANN; since there will commonly be at least 3 NNs, any one of the two other NNs could be the active NN. I could see it being renamed as potentialActiveNNAddresses, but I don't think that gains that much more clarity for the increased verbosity. bq. I saw you removed {final} I was trying to keep in the spirit of the original mini-cluster code. The final safety concern is really only necessary in this case when you are changing the number of configured NNs and then accessing them in different threads; I have no idea when that would even make sense. Even then you wouldn't have been thread-safe in the original code as it there is no locking on the array of NNs. I removed the finals to keep the same style as the original wrt to changing the topology. bq. Are the changes in 'log4j.properties' necessary? Not strictly, but its just the test log4j properties (so no effect on the production version) and just adds more debugging information, in this case, which thread is actually making the log message. I'll update the others > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281537#comment-14281537 ] Jesse Yates commented on HDFS-6440: --- Some follow up after actually looking at the code: bq. Is it possible that doWork throws IOException other than RemoteException? Yup. In fact, the implemention of doWork at EditLogTailer#ln291 can throw an IOException if the call to the proxy for rollEditLog throws an IOException. Sure, this is a bit brittle - a remoteException could be thrown by that call (or any other) as an IOException, but that really can't be helped because we have no other way of differentiating right now. bq. 6. needCheckpoint == true implies sendRequests == true thus when call doCheckpiont(), sendRequest is always true. Yup, that was a slight logic bug. I think setting send request should look like: {code:title=StandbyCheckpointer.java} // on all nodes, we build the checkpoint. However, we only ship the checkpoint if have a // rollback request, are the checkpointer, are outside the quiet period. boolean sendRequest = needCheckpoint && (isPrimaryCheckPointer || secsSinceLast >= checkpointConf.getQuietPeriod()); {code} to actually not send the request every time - it wasn't going to break anything before, but now it should actually conserve bandwidth :) bq. 7. Could you break this line My IDE has that at 99 chars long - isn't 100 chars the standard line width? However, I moved the IOE from the rest of the signature up to the second half of the method declaration. bq. 11. Finally, could you reduce the changes in `MiniDFSCluster.java`, as many of them are not changed, e.g. `MiniDFSCluster.java:911-986`. I think I'm at the minimal number of changes there. Git thinks there are line add and removes frequently when things move around a bit, as this patch necessitates. Fortunately, they should be easy to ignore... but let me know if I'm missing what you are getting at. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-trunk-v1.patch Attaching patch addressing round 2 of comments. Thanks for the feedback - its getting better every round! > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-6440) Support more than 2 NameNodes
Jesse Yates created HDFS-6440: - Summary: Support more than 2 NameNodes Key: HDFS-6440 URL: https://issues.apache.org/jira/browse/HDFS-6440 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha, namenode Reporter: Jesse Yates Most of the work is already done to support more than 2 NameNodes (one active, one standby). This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys should be available for fail-over. Mostly, this is a matter of updating how we parse configurations, some complexity around managing the checkpointing, and updating a whole lot of tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HDFS-6440: -- Attachment: hdfs-6440-cdh-4.5-full.patch Attaching patch for CDH 4.5.0, since this is what we run on at Salesforce. I'll update to the proper open source branches once I've got some consensus that this is the 'right' way to go about doing these changes. For what its worth, all the unit tests have passed (at one point.. they are a bit flaky :)) and we've been doing some m/r based load tests with a chaos monkey(1) and have been successful (2). As mentioned in the issue description, there is a majority of the complexity in the checkpointing. For this, I went with a 'first writer wins' approach. From the standpoint of the standby node, if you're checkpoint isn't accepted (the other NN got one there first) then you back-off for 2x the usual wait time before trying to send it again. I had to add another response code to the GetImageServlet to support the 'someone else won' logic - its not the cleanest solution as other HTTP response codes fit better, but they are already being used to indicate other failure cases. Other notable changes: - EditLogTailer checks all NN when rolling logs - BootstapSTanby uses all namenodes when attempting bootstrap - update block token creation to segment integer space by NN id - updating NN dir creation to include ns index (3) - updated a lot of the tests to support testing across all the NNs, including HAStressTestHarness, and a circular linked list writing test - moved to using a multi-map of NNs in MiniDFSCluster as they are no longer limited to two NNs. (1) each mapper writes a linked list of files, then ensures it can read it back (2) required a bit of tuning to ride over reconnections once we started killing NNs more than every 60 seconds (3) Not sure the best way to update the tests for this. Right now made some changes to TestDFSUpgradeFromImage, but that might need a little rework. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Reporter: Jesse Yates > Attachments: hdfs-6440-cdh-4.5-full.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481709#comment-14481709 ] Jesse Yates commented on HDFS-6440: --- Us too. We are waiting on a committer to have time to look at it. Head from Lei that he is happy with the state and had passed it onto [~atm] for review and commit, but that's the last I head about any progress (that was mid february). [~patrickwhite] maybe you can get one of the FB commiters to help get it committed? I'm just tentative to do _another_ rebase of this patch to not have it be committed. Honestly, I'm surprised that the various companies that have a stake in HDFS being successful in production haven't been more supportive of getting this patch committed. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)