[ https://issues.apache.org/jira/browse/HDFS-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068177#comment-13068177 ]
Todd Lipcon commented on HDFS-2149: ----------------------------------- Looking over the patch, it seems like this is really two distinct parts: 1) cleans up the writing of edits: -- rather than writing logEdit(opcode, Writable...) in FSEditLog, the API is logEdit(op record) -- this API change filters down into EditLogOutputStream as well -- writeRaw is introduced to be used from the BN (like in 1073 branch) -- EditLogBackupOutputStream, which used to buffer Writables in JournalRecord, now directly buffers the op objects 2) Changes the API around *reading* ops to be an iterator-style interface rather than a byte-style interface, by pushing down the FSEditLogOp.Reader into EditLogInputStream. I'm convinced that (1) above is a good idea, and I think the patch is mostly ready for commit modulo a couple things below. I'm not yet convinced that (2) above is quite right yet. We might need another iteration or two to discuss the APIs. Would you mind using this JIRA to get part (1) committed, and then work out the kinks in (2) next? Or is there some reason that part (1) depends on part (2) that I'm missing? Issues: *EditLogBackupOutputStream bug*: Buffering the Op objects in EditLogBackupOutputStream is likely to cause a bug since the objects are reused. If a single thread logs twice before syncing (unlikely but possible for some operations) the edits will get modified in place and the BN will diverge. This might be a bug in the current BN as well, but the Writables weren't always reused, whereas these op objects are. To fix this, I think we should make EditLogBackupOutputStream and EditLogFileOutputStream act the same - ie serialize the ops immediately into a buffer. Perhaps they could share this code in the abstract base class? Or, introduce a new classs like JournalDoubleBuffer? *getInstance for AddCloseOp*: Rather than have two separate ThreadLocals for AddCloseOp, I think it would be cleaner to add a "setOp()" function there on a single instance. My only reasoning is that then all of the classes are nicely symmetric. *javadoc on writeRaw*: I find this javadoc confusing. What do you mean by "should only be used in new code"? Also, the punctuation is incomplete which makes the comment hard to read. Perhaps just copy the javadoc from the 1073 branch s it doesn't need to be resolved at merge time? *For brevity*: Here's a thought: rather than giving each op its own ThreadLocal and getInstance method, why not make the {{opInstances}} EnumMap that we already have into a ThreadLocal? Something like: {code} // in FSEditLogOp.java: ThreadLocal<EnumMap<FSEditLogOpCodes, FSEditLogOp>> opInstances = new ThreadLocal<EnumMap<FSEditLogOpCodes, FSEditLogOp>>() { @Override protected EnumMap<FSEditLogOpCodes, FSEditLogOp> initialValue() { // ... } }; <T extends FSEditLogOp> T getThreadLocalOp( FSEditLogOpCodes opCode, Class<T> clazz) { return clazz.cast(opInstances.get().get(opCode)); } // in FSEditLog.java: public void logCloseFile(String path, INodeFile newNode) { AddCloseOp op = FSEditLogOp.getThreadLocalOp(OP_CLOSE, AddCloseOp.class) .setPath(path) ...; } {code} I think this would reduce the amount of boilerplate and shouldn't be any discernible perf difference. > Move EditLogOp serialisation and deserialation into one place > ------------------------------------------------------------- > > Key: HDFS-2149 > URL: https://issues.apache.org/jira/browse/HDFS-2149 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Ivan Kelly > Assignee: Ivan Kelly > Fix For: 0.23.0 > > Attachments: HDFS-2149.diff, HDFS-2149.diff > > > On trunk serialisation of editlog ops is in FSEditLog#log* and > deserialisation is in FSEditLogOp.*Op . This improvement is to move the > serialisation code into one place, i.e under FSEditLogOp.*Op. > This is part of HDFS-1580. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira