[jira] [Updated] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

Todd Lipcon (JIRA) Tue, 17 Jul 2012 17:35:38 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated HDFS-3077:
------------------------------

    Attachment: hdfs-3077.txt

bq. If the following is supposed to be a list of host:port pairs, I suggest we 
call it something other than "*.edits.dir". Also, if the default is just a 
path, is it really supposed to be a list of host:port pairs? Or is this comment 
supposed to be referring to DFS_JOURNALNODE_RPC_ADDRESS_KEY?

Yea, the comment was misplaced. Fixed up the comments here.

bq. Could use a class comment and method comments in AsyncLogger.
bq. Missing an @param comment for AsyncLoggerSet#createNewUniqueEpoch.
Fixed.

bq. I think this won't substitute in the correct hostname in a multi-node setup 
with host-based principal names:...

Didn't fix this yet, but added a TODO. I'd like to test and fix any 
security-related bugs in a second pass / follow-up JIRA, since I imagine this 
won't be the only one. 

bq. In IPCLoggerChannel, I wonder if you also shouldn't ensure that httpPort is 
not yet set here:

I ended up renaming {{getEpochInfo}} to {{getJournalStatus}} and moving the 
{{httpPort}} assignment to happen there, which I think makes more sense. I 
don't think it has to assert that it's never been set before, since we will 
probably eventually use this to "reconnect" to a journal after the node has 
restarted, in which case the http port may have changed if it's ephemeral.

bq. Is there no need for IPCLoggerChannel to have a way of closing its 
associated proxy?

Added a {{close()}} function. findbugs also caught the fact that I wasn't 
actually assigning the {{proxy}} member! Oops. Fixed that, too, and wired it 
into the {{QuorumJournalManager.close()}} function.

bq. Seems a little odd that JNStorage relies on a few static functions of 
NNStorage. Is there some better place those functions could live?

I would have liked to share the code in a better way, but I wasn't able to find 
a good way. The issue is that JNStorage can't inherit from NNStorage, since the 
NNStorage class itself has a lot of code related to image storage, not just 
edits. Making a deeper inheritance hierarchy seemed too messy to handle. So 
without a much bigger refactor to split NNStorage in two, I figured the 
duplication of these simple functions was the better route. Does that seem 
reasonable?

bq. I don't understand why JNStorage#analyzeStorage locks the storage directory 
after formatting it. What, if anything, relies on that behavior? Where is it 
unlocked? Might want to add a comment explaining it.

Clarified comment.

bq. Patch needs to be rebased on trunk, e.g. PersistentLong was renamed to 
PersistentLongFile.

Done

bq. This line kind of creeps me out in the constructor of the Journal class. 
Maybe make a no-args version of Storage#getStorageDir that asserts there's only 
one dir?

Done - added {{Storage.getSingularStorageDir}}

bq. In general this patch seems to be mixing in protobufs in a few places where 
non-proto classes seem more appropriate, notably in the Journal and 
JournalNodeRpcServer classes. Perhaps we should create non-proto analogs for 
these protos and add translator methods?

Yea, I would like to address that in a follow-up. As the RPCs have been 
changing, avoiding translators in most places has made it a lot easier to 
evolve stuff without having to change everything in 6 places.

bq. This seems really goofy. Just make another non-proto class and use a 
translator?

I managed to avoid the goofiness with the movement of the {{setHttpPort}} stuff 
into the {{getJournalStatus}} call at startup.

bq. I notice that there's a few TODOs left in this patch. It would be useful to 
know which of these you think need to be fixed before we commit this for real, 
versus those you'd like to leave in and do as follow-ups.

If we commit to a branch, I think we can commit with most of these TODOs in 
place. As is it works for the non-HA case, which is a reasonable proof of 
concept.

bq. Instead of putting all of these classes in the o.a.h.hdfs.qjournal 
packages, I recommend you try to separate these out into 
o.a.h.hdfs.qjoural.client, which implements the NN side of things, and 
o.a.h.hdfs.qjournal.server, which implements the JN side of things. I think 
doing so would make it easier to navigate the code.

Done. I had to make a few things public with the InterfaceAudience.Private 
annotation, but I agree it is an improvement.

bq. Could definitely use some method comments in the Journal class.

Done

bq. Recommend renaming Journal#journal to something like Journal#logEdits or 
Journal#writeEdits.

I kept it the same for consistency with {{JournalProtocol}}.

bq. In JournalNode#getOrCreateJournal, this log message could be more helpful: 
LOG.info("logDir: " + logDir);
bq. Seems like all of the timeouts in QuorumJournalManager should be 
configurable.

Fixed.

bq. I think you already have the config key to address this TODO in 
QJournalProtocolPB: // TODO: need to add a new principal for loggers

Fixed. Also see above about addressing security testing and bug fixes as a 
follow-on JIRA.

bq. s/BackupNode/JournalNode/g:

Fixed.

bq. Use an HTML comment in journalstatus.jsp, instead of Java comments within a 
code block.
bq. Could use some more content for the journalstatus.jsp page. 

Would like to address these as follow-on (this code is from the currently 
committed HDFS-3092 branch)

bq. A few spots in the tests you catch expected IOEs, but don't verify that you 
received the IOE you actually expect.
Got most of these. There's one in which we expect any multitude of different 
errors, but everywhere else I now check the string.

bq. Really solid tests overall, but how about one that actually works with HA? 
You currently have a test for two entirely separate NNs, but not one that uses 
an HA mini cluster.

I'd like to address actually enabling HA with this JournalManager in a separate 
JIRA. There's also a bunch of other tests I'd like to add.

----

I also made a few other changes I wanted to mention since the last patch ATM 
reviewed:

- Added the beginnings of plumbing an md5hash through the synchronization 
protocol, to make sure we don't accidentally end up copying around corrupt data 
if an HTTP transfer fails. The md5s aren't yet calculated, but I added to the 
RPC.
- Removed some attempt at fancy synchronization for synchronizing logs. I was 
previously trying to move the actual downloading of the log from the other host 
outside of the lock, but I'd rather add it back when if it turns out to be 
necessary.
- Small addition to {{ExitUtil}} in the trunk code, so that test cases can 
reset the tracked exception. Needed this in order to have the tests pass 
properly.
- Addressed the findbugs and test failures from the previous QA bot run. The 
test failures were mostly just a couple places I'd forgotten to make trivial 
updates due to my other changes.


My next task is to work on improving the tests. I'm hoping to write a 
randomized test that will trigger all of the existing "TODO" assertions. If a 
randomized test can hit these corner cases, then it's likely to find other 
corner cases we didn't think through in the design as well. (Whereas a targeted 
test would only cover the corner cases we already identified).

I'm also going to continue to look into merging the journal protocols as 
mentioned to Suresh above.
                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, 
> hdfs-3077.txt, hdfs-3077.txt, qjournal-design.pdf, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

Reply via email to