[ 
https://issues.apache.org/jira/browse/HBASE-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413608#comment-17413608
 ] 

Huaxiang Sun commented on HBASE-26273:
--------------------------------------

Thanks [~elserj] and [~taklwu] for the detail info and context. Yeah, I am 
recently crossing this space as TableSnapshotInputFormat is being used in one 
of our use cases. These are very nice improvements

We did notice that when TableSnapshotInputFormat is being used, the disk IO 
increases a lot. So I think it is somehow related with HBASE-26274.

One thing I am not sure is that in our case, most of snapshot read is through 
SCR (local read), even if it reads LEAF_INDEX block a lot, these blocks are 
probably cached in OS's buffer cache and it should not cause excessive disk IO. 
I will spend some time to figure out what is going on there.

We will definitely will apply these improvements to our use cases when they 
land, great work!

> TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use 
> ReadType.STREAM for scanning HFiles 
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26273
>                 URL: https://issues.apache.org/jira/browse/HBASE-26273
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 3.0.0-alpha-1, 2.4.6
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Josh Elser
>            Priority: Major
>
> After the change in HBASE-17917 that use PREAD ({{ReadType.DEFAULT}}) for all 
> user scan, the behavior of TableSnapshotInputFormat changed from STREAM to 
> PREAD.
> TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch 
> engine that should read the entire HFile in the container/executor, with 
> default always to PREAD, we executing a lot more DFSInputStream#seek calls to 
> simply read through the datablock section of the HFile.
> The goal of this change is to make any downstream using 
> TableSnapshotInputFormat with STREAM scan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to