[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243610#comment-14243610 ]
zhangduo commented on HBASE-10201: ---------------------------------- [~stack] In your scenario, I think we will use #1 to skip edits, not #4. As I see code in replayRecoveredEditsIfAny {code} long minSeqIdForTheRegion = -1; for (Long maxSeqIdInStore : maxSeqIdInStores.values()) { if (maxSeqIdInStore < minSeqIdForTheRegion || minSeqIdForTheRegion == -1) { minSeqIdForTheRegion = maxSeqIdInStore; } } {code} And this {code} maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId <= minSeqIdForTheRegion) { if (LOG.isDebugEnabled()) { String msg = "Maximum sequenceid for this wal is " + maxSeqId + " and minimum sequenceid for the region is " + minSeqIdForTheRegion + ", skipped the whole file, path=" + edits; LOG.debug(msg); } continue; } {code} And in replayRecoveredEdits, we skip edit cells using per store seqId {code} // Now, figure if we should skip this edit. if (key.getLogSeqNum() <= maxSeqIdInStores.get(store.getFamily() .getName())) { skippedEdits++; continue; } {code} And when splitting log, we use a lastSeqId got from HMaster to skip edits. If master crash and loss the information, then we will not skip any edits? I'm not sure but I didn't find the code to get lastSeqId from any place other than HMaster. [~jeffreyz] > Port 'Make flush decisions per column family' to trunk > ------------------------------------------------------ > > Key: HBASE-10201 > URL: https://issues.apache.org/jira/browse/HBASE-10201 > Project: HBase > Issue Type: Improvement > Components: wal > Reporter: Ted Yu > Assignee: zhangduo > Fix For: 1.0.0, 2.0.0 > > Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, > HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, > HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, > HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, > HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, > HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, > HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, > HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, > HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, > memstore.png > > > Currently the flush decision is made using the aggregate size of all column > families. When large and small column families co-exist, this causes many > small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)