[ https://issues.apache.org/jira/browse/HDFS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149998#comment-13149998 ]
Todd Lipcon commented on HDFS-2246: ----------------------------------- {code} + // Cache of block to file path where the block is stored + //private static Map<ExtendedBlock, BlockLocalPathInfo> cache = null; {code} remove dead code ---- in {{LocalBlockReader.newBlockReader}}, if the checksum file open fails, then the {{dataIn}} stream will be leaked. You need a try..finally to close these if the open wasn't successful ---- why does LocalBlockReader extend RemoteBlockReader2? It should be its own implementation of BlockReader - it doesn't really share much code with its superclass (looks like only {{skip}}, which is trivial anyway). In terms of member variables, it uses some, but ignores the rest. ---- The cached proxies to the local DN currently don't have the UGI as part of the key. So if a client needs to impersonate different users, the cache will end up caching client proxies associated with previous users, which would be incorrect. Same is true of different timeouts, right? ---- The implementation of {{read}} is fairly inefficient - reading and verifying checksum on one chunk at a time does a poor job of amortizing JNI/syscall overhead. Should probably read 64KB or 128KB at a time. You can address this in a followup. ---- {code} + checksum.verifyChunkedSums(dataBuff, checksumBuff, filename, off); {code} Wrong offset is passed here - you should be passing the position of dataIn (the block offset), not the offset into the target byte array ---- The implementation of {{read}} looks incorrect to me. If you read less than one checksum chunk, and you want to verify checksums then it will always read an entire checksum chunk, so the next read will have incorrect results. ---- In DFSClient, it looks very inefficient to be calling {{isLocalAddress}} without any caching.. iterating over the network interfaces is not cheap (lots of syscalls). Also, the way this is currently designed, each seek will result in the files being re-opened and the metadata re-parsed. I imagine this is bad for efficiency. ---- Why not integrate the short circuit construction into {{getBlockReader}} instead of separately at the two call-sites? ---- {code} +import java.io.*; + +import org.apache.hadoop.io.*; {code} please expand imports ---- {{BlockLocalPathInfo}} needs audience/stability annotations. So does {{BlockMetadataHeader}} now that it's public. Also, since we now have the protocol-independence layer, does this .protocol class need to be Writable? Or can it be handled by the translation layer? ---- Rather than a single user for the local path access, why not use an ACL, given we already have support for parsing/checking ACLs? I can imagine some people may have multiple applications that do local random-read access that could benefit from this. ---- {{checkKerberosAuthMethod}} purports to be a general check, but the error message specifically references {{getBlockPathInfo}}. ---- {{getBlockLocalPathInfo}} has some bad indentation. I'd also change those log messages to TRACE level. ---- In FSDataset.getBlockLocalPathInfo, no need to qualify {{FSDataset.getMetaFile}}. ---- Test wise, you should cover positional reads and differently sized reads. I think this would turn up a bug as pointed out above with reads smaller than a checksum chunk size. > Shortcut a local client reads to a Datanodes files directly > ----------------------------------------------------------- > > Key: HDFS-2246 > URL: https://issues.apache.org/jira/browse/HDFS-2246 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Sanjay Radia > Attachments: 0001-HDFS-347.-Local-reads.patch, > HDFS-2246-branch-0.20-security-205.1.patch, > HDFS-2246-branch-0.20-security-205.2.patch, > HDFS-2246-branch-0.20-security-205.patch, > HDFS-2246-branch-0.20-security-205.patch, > HDFS-2246-branch-0.20-security-205.patch, > HDFS-2246-branch-0.20-security.3.patch, > HDFS-2246-branch-0.20-security.no-softref.patch, > HDFS-2246-branch-0.20-security.patch, HDFS-2246-branch-0.20-security.patch, > HDFS-2246-trunk.patch, HDFS-2246-trunk.patch, HDFS-2246.20s.1.patch, > HDFS-2246.20s.2.txt, HDFS-2246.20s.3.txt, HDFS-2246.20s.4.txt, > HDFS-2246.20s.patch, localReadShortcut20-security.2patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira