[ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763654#comment-13763654
 ] 

Colin Patrick McCabe commented on HDFS-4953:
--------------------------------------------

bq. Actually, it is a standard best practice. Building exception objects is 
expensive and handling exceptions is error-prone.

There's no bundling going on here.  {{DFSClient}} throws the exception, not a 
remote daemon.

Can you be more clear about what the alternative is to an exception here?  It 
seems like a null return would have the same issues.  (unless I am missing 
something?)

bq. I clearly stated in the previous comment [that isZeroCopyAvailable] was 
about whether the zero copy was available AT THE CURRENT POINT IN THE STREAM.

You can't know ahead of time whether your call to mmap will succeed.  As I 
said, mmap can fail, for dozens of reasons.  And of course blocks move over 
time.  There is a fundamental "time of check, time of use" (TOCTOU) race 
condition in this kind of API.

bq. [block boundary comments]

Unless I'm missing something, I think we're both in agreement here-- we return 
a short read when near the block boundary, to avoid using the fallback buffer.

bq. [factory comments]

OK.  I misinterpreted what you said earlier.  I see now that you plan on 
reusing buffers.

Since I am not knowledgeable about ORC, I have to ask you, Owen: is it 
necessary to read 200 MB at a time to decode this file format?  As Todd wrote, 
our benchmarks show that large reads may not be the most efficient due to cache 
effects.

bq. [new API]

This proposal seems a lot better than the previous ones we've come up with.

There is already a method named {{FSDataInputStream#read(ByteBuffer buf)}} in 
{{FSDataInputStream}}.  If we create a new method named 
{{FSDataInputStream#readByteBuffer}}, I would expect there to be some confusion 
between the two.  That's why I proposed {{FSDataInputStream#readZero}} for the 
new name.  Does that make sense?

{code}
public void releaseByteBuffer(ByteBufferFactory factory, ByteBuffer buffer);
{code}

If, instead of returning a {{ByteBuffer}} from the {{readByteBuffer}} call, we 
returned a {{ZeroBuffer}} object wrapping the {{ByteBuffer}}, we could simply 
call {{ZeroBuffer#close()}}.  The {{ZeroBuffer}} itself could retain a 
reference to its factory (if it came from a factory), and the 
{{DFSInputStream}} (if it needed to be munmapped).  This seems a lot more 
object-oriented to me.  To make the client pass in this information seems 
error-prone and confusing.

I don't buy the argument that having a wrapper object imposes an "unacceptable 
performance hit."  As I mentioned before, if you *don't* have a wrapper object, 
you have to do a hash table lookup in {{DFSInputStream#releaseByteBuffer}} to 
see if the buffer being released is one that we mmap'ed.  A hash table lookup 
is surely slower than a single object dereference.

I'd like to get some other prospective zero-copy API users to comment on 
whether they like the wrapper object or the 
{{DFSInputStream#releaseByteBuffer}} approach better...
                
> enable HDFS local reads via mmap
> --------------------------------
>
>                 Key: HDFS-4953
>                 URL: https://issues.apache.org/jira/browse/HDFS-4953
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: HDFS-4949
>
>         Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access 
> files directly without going through the DataNode.  However, all of these 
> reads involve a copy at the operating system level, since they rely on the 
> read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable 
> truly zero-copy reads.
> In the initial implementation, zero-copy reads will only be performed when 
> checksums were disabled.  Later, we can use the DataNode's cache awareness to 
> only perform zero-copy reads when we know that checksum has already been 
> verified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to