[ 
https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870334#action_12870334
 ] 

Todd Lipcon commented on HBASE-2468:
------------------------------------

For TIF, I don't see any reason to prefetch meta. Each mapper taking *input* 
from a table accesses only a small subset of regions, and scanning the whole 
thing is total overkill.

For TOF, or jobs that simply access HBase but don't use either of our provided 
output formats, having a prewarmed meta cache is primarily avoid DDOSing the 
META server. On a 500 node MR cluster, you might well have 4000 mappers start 
within the period of just a few seconds and overwhelm META pretty fast. This is 
a real use case for many people - imagine storing a dimension table in HBase 
and doing a hash join against it from a MR job processing many TB of logs.

So, for the MR case, I think we should provide the option to serialize META to 
disk, put it in the DistributedCache, and then prewarm the meta cache from 
there. This would reduce number of mappers actually hitting META to nearly 0.

Doing a scan of META to fetch *all* rows for this table, though, probably makes 
the problem even worse, especially for jobs that only access a subset of the 
content.

My opinions are that we should:
a) prefetch ahead a few rows on any META miss, since it will fill the cache up 
faster and catch split children. Perhaps we can do a benchmark to see whether a 
10-row scan is any harder to service than a 2-row scan - my hunch is that the 
load on the server is mostly dominated by constant time, so we may as well scan 
ahead a bit.
b) allow the *option* to fetch a row range (which includes the full table 
range) into the cache. This could be used in startup of long-running processes 
(eg the thrift gateway or stargate may prefer to warm up its cache before it 
starts accepting any user requests). This should not be default. I think of 
this as the equivalent of posix_fadvise(..., POSIX_FADV_WILLNEED). Providing 
the API as a range will also allow us to do it for multiregion scans, etc.
c) allow the option to serialize meta rows into a sequencefile and then load 
them back. This provides the improvement I mentioned above for large MR jobs 
randomly accessing a cluster. 

I see the above 3 things as separate tasks. It sounds like the current patch 
can do the first of the three, so maybe we should separate that out, get it 
committed, and then move on to b and c?


> Improvements to prewarm META cache on clients
> ---------------------------------------------
>
>                 Key: HBASE-2468
>                 URL: https://issues.apache.org/jira/browse/HBASE-2468
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Todd Lipcon
>            Assignee: Mingjie Lai
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2468-trunk.patch
>
>
> A couple different use cases cause storms of reads to META during startup. 
> For example, a large MR job will cause each map task to hit meta since it 
> starts with an empty cache.
> A couple possible improvements have been proposed:
>  - MR jobs could ship a copy of META for the table in the DistributedCache
>  - Clients could prewarm cache by doing a large scan of all the meta for the 
> table instead of random reads for each miss
>  - Each miss could fetch ahead some number of rows in META

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to