[ 
https://issues.apache.org/jira/browse/HDFS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149998#comment-13149998
 ] 

Todd Lipcon commented on HDFS-2246:
-----------------------------------

{code}
+  // Cache of block to file path where the block is stored
+  //private static Map<ExtendedBlock, BlockLocalPathInfo> cache = null;
{code}
remove dead code
----

in {{LocalBlockReader.newBlockReader}}, if the checksum file open fails, then 
the {{dataIn}} stream will be leaked. You need a try..finally to close these if 
the open wasn't successful

----

why does LocalBlockReader extend RemoteBlockReader2? It should be its own 
implementation of BlockReader - it doesn't really share much code with its 
superclass (looks like only {{skip}}, which is trivial anyway). In terms of 
member variables, it uses some, but ignores the rest.

----

The cached proxies to the local DN currently don't have the UGI as part of the 
key. So if a client needs to impersonate different users, the cache will end up 
caching client proxies associated with previous users, which would be incorrect.

Same is true of different timeouts, right?

----

The implementation of {{read}} is fairly inefficient - reading and verifying 
checksum on one chunk at a time does a poor job of amortizing JNI/syscall 
overhead. Should probably read 64KB or 128KB at a time. You can address this in 
a followup.

----

{code}
+            checksum.verifyChunkedSums(dataBuff, checksumBuff, filename, off);
{code}
Wrong offset is passed here - you should be passing the position of dataIn (the 
block offset), not the offset into the target byte array

----

The implementation of {{read}} looks incorrect to me. If you read less than one 
checksum chunk, and you want to verify checksums then it will always read an 
entire checksum chunk, so the next read will have incorrect results.

----

In DFSClient, it looks very inefficient to be calling {{isLocalAddress}} 
without any caching.. iterating over the network interfaces is not cheap (lots 
of syscalls).

Also, the way this is currently designed, each seek will result in the files 
being re-opened and the metadata re-parsed. I imagine this is bad for 
efficiency.

----
Why not integrate the short circuit construction into {{getBlockReader}} 
instead of separately at the two call-sites?

----
{code}
+import java.io.*;
+
+import org.apache.hadoop.io.*;
{code}

please expand imports

----
{{BlockLocalPathInfo}} needs audience/stability annotations. So does 
{{BlockMetadataHeader}} now that it's public.

Also, since we now have the protocol-independence layer, does this .protocol 
class need to be Writable? Or can it be handled by the translation layer?
----

Rather than a single user for the local path access, why not use an ACL, given 
we already have support for parsing/checking ACLs? I can imagine some people 
may have multiple applications that do local random-read access that could 
benefit from this.

----

{{checkKerberosAuthMethod}} purports to be a general check, but the error 
message specifically references {{getBlockPathInfo}}.

----

{{getBlockLocalPathInfo}} has some bad indentation. I'd also change those log 
messages to TRACE level.

----

In FSDataset.getBlockLocalPathInfo, no need to qualify 
{{FSDataset.getMetaFile}}.

----

Test wise, you should cover positional reads and differently sized reads. I 
think this would turn up a bug as pointed out above with reads smaller than a 
checksum chunk size.
                
> Shortcut a local client reads to a Datanodes files directly
> -----------------------------------------------------------
>
>                 Key: HDFS-2246
>                 URL: https://issues.apache.org/jira/browse/HDFS-2246
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>         Attachments: 0001-HDFS-347.-Local-reads.patch, 
> HDFS-2246-branch-0.20-security-205.1.patch, 
> HDFS-2246-branch-0.20-security-205.2.patch, 
> HDFS-2246-branch-0.20-security-205.patch, 
> HDFS-2246-branch-0.20-security-205.patch, 
> HDFS-2246-branch-0.20-security-205.patch, 
> HDFS-2246-branch-0.20-security.3.patch, 
> HDFS-2246-branch-0.20-security.no-softref.patch, 
> HDFS-2246-branch-0.20-security.patch, HDFS-2246-branch-0.20-security.patch, 
> HDFS-2246-trunk.patch, HDFS-2246-trunk.patch, HDFS-2246.20s.1.patch, 
> HDFS-2246.20s.2.txt, HDFS-2246.20s.3.txt, HDFS-2246.20s.4.txt, 
> HDFS-2246.20s.patch, localReadShortcut20-security.2patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to