[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

Todd Lipcon (JIRA) Wed, 28 Nov 2018 10:34:28 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702237#comment-16702237
 ]


Todd Lipcon commented on HDFS-14111:
------------------------------------

[~ste...@apache.org] would be curious on your take on this. Seems we could fix 
this in a couple different ways:

1) add a new method (perhaps private, only callable from JNI) to 
FSDataInputStream such as 'supportsByteBufferRead()', which could check if the 
underlying stream is an instance of ByteBufferReadable. This might not work 
though if we expect that some FS implementers implement ByteBufferReadable but 
throw UnsupportedOperationException when it's actually called.

2) change HDFS so that read() with a zero length short circuits with no effect. 
This is a slight behavior change (maybe someone was relying on this to 
"pre-buffer" some data, or to check whether a stream is available/readable?

3) assume that all streams are direct readable, and on the first call to 
'hdfsRead()', try the direct-read path. If at that point it throws 
UnsupportedOperationException, mark a flag indicating it's not direct-readable 
and fall back. We'd probably have to futz a bit with the 
hdfsFileUsesDirectRead() API to make it eagerly check if the lazy check hasn't 
been "filled in" yet. But I imagine that API isn't used too frequently.

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -------------------------------------------------------------
>
>                 Key: HDFS-14111
>                 URL: https://issues.apache.org/jira/browse/HDFS-14111
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, libhdfs
>    Affects Versions: 3.2.0
>            Reporter: Todd Lipcon
>            Priority: Major
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

Reply via email to