[ 
https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845687#comment-13845687
 ] 

Colin Patrick McCabe commented on HDFS-5634:
--------------------------------------------

bq. Do we mean to setCachingStrategy in DFSInputStream#getBlockReader? Also, I 
get that there are a zillion parameters for the BRL constructor, but builders 
are for when there are optional arguments. Here, it looks like we want to set 
all of them.

Actually, in the tests, we often don't set a lot of the arguments.  For 
example, the unit tests don't use the FISCache, may not set readahead, etc.  
Also, I think there's value in naming the arguments, since otherwise updating 
the callsites gets very, very difficult.

bq. We have both verifyChecksum and skipChecksum right now. Let's get rid of 
one, seems error-prone to be flipping booleans.

OK.  I updated to {{BlockReaderFactory#newShortCircuitBlockReader}} to use 
{{skipChecksums}} as well.

A little note on the history here: prior to the introduction of mlock, it was 
more straightforward to have a simple positive boolean "verifyChecksum" than to 
have the skip boolean.  But now that we have mlock, verifyChecksum = true might 
be a lie, since mlock might mean we don't verify.

bq. The skipChecksum || mlocked.get() idiom is used in a few places, maybe 
should be a shouldSkipChecksum() method?

OK.

bq. IIUC, fillDataBuf fills the bounce buffer, and drainBounceBuffer empties 
it. Rename fillDataBuf to fillBounceBuffer for parity?

I renamed {{drainBounceBuffer}} to {{drainDataBuf}} for symmetry.

bq. I'm wondering what happens in the bounce buffer read paths when readahead 
is turned off. It looks like they use readahead to determine how much to read, 
regardless of the bytes needed, so what if it's zero?

We always buffer at least a single chunk, even if readahead is turned off.  The 
mechanics of checksumming require this.

bq. For the slow lane, fillDataBuf doesn't actually fill the returned buf, so 
when we hit the EOF and break, it looks like we make the user read again to 
flush out the bounce buffer. Can we save this?

Yeah, the current code could result in us doing an extra {{pread}} even after 
we know we're at EOF.  Let me see if I can avoid that.

bq. fillDataBuf doesn't fill just the data buf, it also fills the checksum buf 
and verifies checksums via fillBuffer. Would be nice to javadoc this.

OK

bq. I noticed there are two readahead config options too, 
dfs.client.cache.readahead and dfs.datanode.readahead.bytes. Seems like we 
should try to emulate the same behavior as remote reads which (according to 
hdfs-default.xml) use the DN setting, and override with the client setting. 
Right now it's just using the DN readahead in BRL, so the test that sets client 
readahead to 0 isn't doing much.

Right now, the readahead is coming out of {{DFSClient#cachingStrategy}}, so it 
will be coming from {{dfs.client.cache.readahead}}, unless someone has 
overridden it for that {{DFSInputStream}} object.  The problem with defaulting 
to the DN setting, is that we don't know what that is (we're on the client, not 
the DN).

bq. I don't quite understand why we check needed > maxReadahead... for the fast 
lane. Once we're checksum aligned via draining the bounce buffer, can't we just 
stay in the fast lane? Seems like the slow vs. fast lane determination should 
be based on read alignment, not bytes left.

The issue is that we want to honor the readahead setting.  We would not be 
doing this if we did a shorter read directly into the provided buffer.

bq. It's a little weird to me that the readahead chunks is min'd with the 
buffer size (default 1MB). I get why (memory consumption), but this linkage 
should be documented somewhere.

I added a comment.

bq. DirectBufferPool, would it be better to use a Deque's stack operations 
rather than a Queue? Might give better cache locality to do LIFO rather than 
FIFO.

Interesting point.  I will try that and see what numbers I get.

bq.TestEnhancedByteBufferAccess has an import only change

OK.  I will avoid doing that to make merging easier.

bq. Kinda unrelated, but should the "dfs.client.read.shortcircuit.*" keys be in 
hdfs-default.xml? I also noticed that "dfs.client.cache.readahead" says "this 
setting causes the datanode to..." so the readahead documentation might need to 
be updated too.

I'll update it with the information about short-circuit

> allow BlockReaderLocal to switch between checksumming and not
> -------------------------------------------------------------
>
>                 Key: HDFS-5634
>                 URL: https://issues.apache.org/jira/browse/HDFS-5634
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch
>
>
> BlockReaderLocal should be able to switch between checksumming and 
> non-checksumming, so that when we get notifications that something is mlocked 
> (see HDFS-5182), we can avoid checksumming when reading from that block.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to