[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243610#comment-14243610
 ] 

zhangduo commented on HBASE-10201:
----------------------------------

[~stack] In your scenario, I think we will use #1 to skip edits, not #4.
As I see code in replayRecoveredEditsIfAny
{code}
    long minSeqIdForTheRegion = -1;
    for (Long maxSeqIdInStore : maxSeqIdInStores.values()) {
      if (maxSeqIdInStore < minSeqIdForTheRegion || minSeqIdForTheRegion == -1) 
{
        minSeqIdForTheRegion = maxSeqIdInStore;
      }
    }
{code}
And this
{code}
      maxSeqId = Math.abs(Long.parseLong(fileName));
      if (maxSeqId <= minSeqIdForTheRegion) {
        if (LOG.isDebugEnabled()) {
          String msg = "Maximum sequenceid for this wal is " + maxSeqId
            + " and minimum sequenceid for the region is " + 
minSeqIdForTheRegion
            + ", skipped the whole file, path=" + edits;
          LOG.debug(msg);
        }
        continue;
      }
{code}
And in replayRecoveredEdits, we skip edit cells using per store seqId
{code}
            // Now, figure if we should skip this edit.
            if (key.getLogSeqNum() <= maxSeqIdInStores.get(store.getFamily()
                .getName())) {
              skippedEdits++;
              continue;
            }
{code}

And when splitting log, we use a lastSeqId got from HMaster to skip edits. If 
master crash and loss the information, then we will not skip any edits? I'm not 
sure but I didn't find the code to get lastSeqId from any place other than 
HMaster. [~jeffreyz]

> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>             Fix For: 1.0.0, 2.0.0
>
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
> HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
> HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
> HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
> HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
> HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
> HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
> memstore.png
>
>
> Currently the flush decision is made using the aggregate size of all column 
> families. When large and small column families co-exist, this causes many 
> small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to