[ 
https://issues.apache.org/jira/browse/HBASE-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710074#action_12710074
 ] 

stack commented on HBASE-1008:
------------------------------

J-D, is it true that we read in all the logs before we start splitting?  It 
looks that way after going back to the patch.  If so, I missed that -- my fault 
-- and I think this a prob.

Theoretically, we can have at most 64 logs under a regionserver, each of which 
has ~64MB of edits.  Thats 4G of edits that we need to pull in before we start 
processing.

Can we not run the writer threads every Nth file read, say, every 5 or 10 even?

Thanks.

> [performance] The replay of logs on server crash takes way too long
> -------------------------------------------------------------------
>
>                 Key: HBASE-1008
>                 URL: https://issues.apache.org/jira/browse/HBASE-1008
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.0, 0.19.3
>
>         Attachments: 1008-v2.patch, hbase-1008-3.patch, 
> hbase-1008-v4-0.19.patch, hbase-1008-v4.patch
>
>
> Watching recovery from a crash on streamy.com where there were 1048 logs and 
> repay is running at rate of about 20 seconds each.  Meantime these regions 
> are not online.  This is way too long to wait on recovery for a live site.  
> Marking critical.  Performance related so priority and in 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to