[
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163177#comment-17163177
]
Michael Stack commented on HBASE-23634:
---------------------------------------
{quote}1、before compaction, large number of small hfiles affect read and write
performance of region
2、a hfile needs 3 NN RPCs to bulkload during
openRegion(validate、rename、createReader)
if bulkLoadService ThreadNum is 3, and hfiles is 20(because wal number is 20),
and RS is 100, region is 2K*100, and openRegion thread is 75
so hbase needs 3*3*75*100 concurrent NN RPCs and needs 3*20*2K*100 total NN RPCs
{quote}
>From [~Bo Cui]
We can quibble with some of the assessment made above but it does suggest a
better accounting is needed before we enable this as the default:
* Compare recovered.edits write amplification vs that of writing small hfiles
then immediately doing a rewrite via compaction (I like the [~zghao]
interpretation of Bo Cui's list as opening the recovered.hfiles as part of the
Region w/ the compaction bringing them into Store directory from the .tmp dir)
* Replay of recovered.edits inline w/ open as opposed to just opening the file
(MTTR benefits).
* A compare of NN RPCs as noted above by Bo Cui.
* The copy from bulkload of hfile validation is broken – for recovered hfiles
and for bulk load – when recovery is for hfiles for meta table (see sub-issue)
but the problem is deep-seated needing lots of work to fix. We could remove the
validation since the 'system' wrote the files as [~zghao] suggests or move the
validation to file open as part of open Region (could end up failing the Region
open more often).
One question, if we only partially write an HFile and we don't complete
(because crash splitting the WAL say), does it get sidelined, cleaned up? Just
wondering. Thanks.
Unscheduling from 2.4 for now..... leaving against hbase3.
> Enable "Split WAL to HFile" by default
> --------------------------------------
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
> Issue Type: Task
> Affects Versions: 3.0.0-alpha-1, 2.3.0
> Reporter: Guanghao Zhang
> Priority: Blocker
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)