omalley commented on a change in pull request #1830: HADOOP-11867: Add gather API to file system. URL: https://github.com/apache/hadoop/pull/1830#discussion_r374940578
########## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java ########## @@ -261,15 +270,129 @@ protected int readChunk(long pos, byte[] buf, int offset, int len, len = Math.min(len, bytesPerSum * (sumLenRead / CHECKSUM_SIZE)); } } - if(pos != datas.getPos()) { + if (pos != datas.getPos()) { datas.seek(pos); } int nread = readFully(datas, buf, offset, len); if (eof && nread > 0) { - throw new ChecksumException("Checksum error: "+file+" at "+pos, pos); + throw new ChecksumException("Checksum error: " + file + " at " + pos, pos); } return nread; } + + public static long findChecksumOffset(long dataOffset, + int bytesPerSum) { + return HEADER_LENGTH + (dataOffset/bytesPerSum) * FSInputChecker.CHECKSUM_SIZE; + } + + /** + * Find the checksum ranges that correspond to the given data ranges. Review comment: Why what is needed? You mean the code to compare the checksums? The current code requires a lot of context that isn't true in the new API. The current code is super inefficient because it did a bad job of working around those limitations. In particular, if you look at the current pread code, it reopens the crc file for each seek. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org