[ https://issues.apache.org/jira/browse/HBASE-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell resolved HBASE-3099. ----------------------------------- Resolution: Not a Problem Probably superseded by distributed log splitting > optimization for log splitting (theory/suggestion) > -------------------------------------------------- > > Key: HBASE-3099 > URL: https://issues.apache.org/jira/browse/HBASE-3099 > Project: HBase > Issue Type: Bug > Reporter: ryan rawson > > Right now log splitting is slower than we'd like. The slow pace of log > splitting is one of the reasons why we have to keep a short, bounded, limit > of the outstanding log files. It would be nice to up that limit, to allow > perhaps hundreds of logs. It would increase efficiency because we would not > be force-flushing regions at non-ideal sizes. > But more data means more to process. Except that not all of the logs for a > regionserver are actually useful. This is because some regions got flushed > before the oldest log was trimmed. So during log recovery if we read the > most recent sequenceid, we could skip, during log splitting (in the master), > those entries and avoid writing them to the per-region log recovery. It > would reduce the IO by part, and if our serialization/deser code was clever > we might be able to avoid deserializing much. > It's not clear how effective or worthwhile this might be. -- This message was sent by Atlassian JIRA (v6.2#6252)