[ 
https://issues.apache.org/jira/browse/HDFS-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848340#comment-13848340
 ] 

Liang Xie commented on HDFS-5664:
---------------------------------

bq. since there would still be a big "synchronized" on all the 
DFSInputStream#read methods which use the BlockReader
This can be fixed by HDFS-1605, e.g. use a read lock for read()

bq. If multiple threads want to read the same file at the same time, they can 
open multiple distinct streams for it. At that point, they're not sharing the 
same BlockReader, so whether or not BRL is synchronized doesn't matter.
yes, this is a feasible idea. 
But in current HBase codebase, we use only one stream(or two streams 
considering checksum or not in old version) for one HFile.So seems here is a 
critical performance issue. we should try to figure out is it possible to 
remove the synchronized keyword in BlockReader or we must consider to use 
multiple thread pattern. [~stack], do you familiar with here: why HBase use one 
stream always for one HFile in history?
I'll try to understand some background here as well.

> try to relieve the BlockReaderLocal read() synchronized hotspot
> ---------------------------------------------------------------
>
>                 Key: HDFS-5664
>                 URL: https://issues.apache.org/jira/browse/HDFS-5664
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>
> Current the BlockReaderLocal's read has a synchronized modifier:
> {code}
> public synchronized int read(byte[] buf, int off, int len) throws IOException 
> {
> {code}
> In a HBase physical read heavy cluster, we observed some hotspots from 
> dfsclient path, the detail strace trace could be found from: 
> https://issues.apache.org/jira/browse/HDFS-1605?focusedCommentId=13843241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13843241
> I haven't looked into the detail yet, put some raw ideas here firstly:
> 1) replace synchronized with try lock with timeout pattern, so could 
> fail-fast,  2) fallback to non-ssr mode if get a local reader lock failed.
> There're two suitable scenario at least to remove this hotspot:
> 1) Local physical read heavy, e.g. HBase block cache miss ratio is high
> 2) slow/bad disk.
> It would be helpful to achive a lower 99th percentile HBase read latency 
> somehow.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to