[ https://issues.apache.org/jira/browse/HDFS-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716004#comment-14716004 ]
Andrew Wang commented on HDFS-8965: ----------------------------------- Hi Colin, thanks for working on this. Seems like the idea is to, for length-prefixed edit logs, to check the checksum before reading the fields. Sensible and good change, overall looks good. Mostly nit comments: * A comment mentions a 1-byte txid, I think you meant opid. Having the expected format in the comment is super helpful too, thanks. Mind replicating it for the other two readers too? * Why move the temp array into a member? It'll probably be stack allocated and thus no GC pressure, 4KB should fit too. * Great opportunity to fix the LayoutVersion.EDITS_CHESKUM typo * LengthPrefixedReader#scanOp, shouldn't this be basically readOp but without the second pass to readFields? Not sure why it doesn't validate the max edit size or the checksum. The TODO is relevant. * Can we get a unit test for this situation? * Is some code sharing is possible among the three readOp's for getting the FSEditLogOp, this bit? {noformat} byte opCodeByte; try { opCodeByte = in.readByte(); } catch (EOFException eof) { // EOF at an opcode boundary is expected. return null; } FSEditLogOpCodes opCode = FSEditLogOpCodes.fromByte(opCodeByte); if (opCode == OP_INVALID) { verifyTerminator(); return null; } FSEditLogOp op = cache.get(opCode); if (op == null) { throw new IOException("Read invalid opcode " + opCode); } {noformat} > Harden edit log reading code against out of memory errors > --------------------------------------------------------- > > Key: HDFS-8965 > URL: https://issues.apache.org/jira/browse/HDFS-8965 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.0-alpha > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-8965.001.patch, HDFS-8965.002.patch, > HDFS-8965.003.patch > > > We should harden the edit log reading code against out of memory errors. Now > that each op has a length prefix and a checksum, we can validate the checksum > before trying to load the Op data. This should avoid out of memory errors > when trying to load garbage data as Op data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)