[ https://issues.apache.org/jira/browse/HBASE-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413608#comment-17413608 ]
Huaxiang Sun commented on HBASE-26273: -------------------------------------- Thanks [~elserj] and [~taklwu] for the detail info and context. Yeah, I am recently crossing this space as TableSnapshotInputFormat is being used in one of our use cases. These are very nice improvements We did notice that when TableSnapshotInputFormat is being used, the disk IO increases a lot. So I think it is somehow related with HBASE-26274. One thing I am not sure is that in our case, most of snapshot read is through SCR (local read), even if it reads LEAF_INDEX block a lot, these blocks are probably cached in OS's buffer cache and it should not cause excessive disk IO. I will spend some time to figure out what is going on there. We will definitely will apply these improvements to our use cases when they land, great work! > TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use > ReadType.STREAM for scanning HFiles > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-26273 > URL: https://issues.apache.org/jira/browse/HBASE-26273 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Affects Versions: 3.0.0-alpha-1, 2.4.6 > Reporter: Tak-Lon (Stephen) Wu > Assignee: Josh Elser > Priority: Major > > After the change in HBASE-17917 that use PREAD ({{ReadType.DEFAULT}}) for all > user scan, the behavior of TableSnapshotInputFormat changed from STREAM to > PREAD. > TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch > engine that should read the entire HFile in the container/executor, with > default always to PREAD, we executing a lot more DFSInputStream#seek calls to > simply read through the datablock section of the HFile. > The goal of this change is to make any downstream using > TableSnapshotInputFormat with STREAM scan. -- This message was sent by Atlassian Jira (v8.3.4#803005)