[ 
https://issues.apache.org/jira/browse/HBASE-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-3099.
-----------------------------------

    Resolution: Not a Problem

Probably superseded by distributed log splitting

> optimization for log splitting (theory/suggestion)
> --------------------------------------------------
>
>                 Key: HBASE-3099
>                 URL: https://issues.apache.org/jira/browse/HBASE-3099
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>
> Right now log splitting is slower than we'd like.  The slow pace of log 
> splitting is one of the reasons why we have to keep a short, bounded, limit 
> of the outstanding log files.  It would be nice to up that limit, to allow 
> perhaps hundreds of logs.  It would increase efficiency because we would not 
> be force-flushing regions at non-ideal sizes.
> But more data means more to process.  Except that not all of the logs for a 
> regionserver are actually useful.  This is because some regions got flushed 
> before the oldest log was trimmed.  So during log recovery if we read the 
> most recent sequenceid, we could skip, during log splitting (in the master), 
> those entries and avoid writing them to the per-region log recovery.  It 
> would reduce the IO by part, and if our serialization/deser code was clever 
> we might be able to avoid deserializing much.  
> It's not clear how effective or worthwhile this might be.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to