[ 
https://issues.apache.org/jira/browse/HBASE-20430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581983#comment-16581983
 ] 

Mingliang Liu commented on HBASE-20430:
---------------------------------------

[~apurtell] It makes sense to me to simply introduce an option to close every 
store file after the reader has finished. To determine the performance impact, 
I think we can benefit from your previous experiment settings. I can work on 
this JIRA with your help.

Improving Hadoop's S3A by multiplexing open files seems not straightforward to 
me. Let me refresh my mind about S3A and check possible improvements.

I like the idea of "predictive cache", and I think we can turn to it if the 
simple solution does not show us good results. That can be separate effort.

> Improve store file management for non-HDFS filesystems
> ------------------------------------------------------
>
>                 Key: HBASE-20430
>                 URL: https://issues.apache.org/jira/browse/HBASE-20430
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Priority: Major
>
> HBase keeps a file open for every active store file so no additional round 
> trips to the NameNode are needed after the initial open. HDFS internally 
> multiplexes open files, but the Hadoop S3 filesystem implementations do not, 
> or, at least, not as well. As the bulk of data under management increases we 
> observe the required number of concurrently open connections will rise, and 
> expect it will eventually exhaust a limit somewhere (the client, the OS file 
> descriptor table or open file limits, or the S3 service).
> Initially we can simply introduce an option to close every store file after 
> the reader has finished, and determine the performance impact. Use cases 
> backed by non-HDFS filesystems will already have to cope with a different 
> read performance profile. Based on experiments with the S3 backed Hadoop 
> filesystems, notably S3A, even with aggressively tuned options simple reads 
> can be very slow when there are blockcache misses, 15-20 seconds observed for 
> Get of a single small row, for example. We expect extensive use of the 
> BucketCache to mitigate in this application already. Could be backed by 
> offheap storage, but more likely a large number of cache files managed by the 
> file engine on local SSD storage. If misses are already going to be super 
> expensive, then the motivation to do more than simply open store files on 
> demand is largely absent.
> Still, we could employ a predictive cache. Where frequent access to a given 
> store file (or, at least, its store) is predicted, keep a reference to the 
> store file open. Can keep statistics about read frequency, write it out to 
> HFiles during compaction, and note these stats when opening the region, 
> perhaps by reading all meta blocks of region HFiles when opening. Otherwise, 
> close the file after reading and open again on demand. Need to be careful not 
> to use ARC or equivalent as cache replacement strategy as it is encumbered. 
> The size of the cache can be determined at startup after detecting the 
> underlying filesystem. Eg. setCacheSize(VERY_LARGE_CONSTANT) if (fs 
> instanceof DistributedFileSystem), so we don't lose much when on HDFS still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to