[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits -> DATALOSS

stack (JIRA) Tue, 02 Jun 2015 12:17:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569599#comment-14569599
 ]


stack commented on HBASE-13811:
-------------------------------

Confirmed the above thesis. When we start to flush, we remove current oldest 
sequence ids and park them in another data structure (See 
FSHLog#startCacheFlush). We then let appends go on while the flush is 
happening. Appends, if they find the oldest sequence ids empty, will add 
current id as the oldest sequence id (See 
FSHLog#updateOldestUnflushedSequenceIds pasted above). The report to the master 
always reads the oldests sequence ids data structure. It does not look to see 
if an ongoing flush going on. It used the below from FSHLog:

{code}
  public long getEarliestMemstoreSeqNum(byte[] encodedRegionName,
      byte[] familyName) {
    ConcurrentMap<byte[], Long> oldestUnflushedStoreSequenceIdsOfRegion =
        this.oldestUnflushedStoreSequenceIds.get(encodedRegionName);
    if (oldestUnflushedStoreSequenceIdsOfRegion != null) {
      Long result = oldestUnflushedStoreSequenceIdsOfRegion.get(familyName);
      return result != null ? result.longValue() : HConstants.NO_SEQNUM;
    } else {
      return HConstants.NO_SEQNUM;
    }
  }
{code}

One fix would be to have the report to master consider ongoing flushes but let 
me see if can simplify this at all....

> Splitting WALs, we are filtering out too many edits -> DATALOSS
> ---------------------------------------------------------------
>
>                 Key: HBASE-13811
>                 URL: https://issues.apache.org/jira/browse/HBASE-13811
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: stack
>            Priority: Critical
>
> I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
> ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
> patch for HBASE-13616 was in place so can only think it the cause (but cannot 
> see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits -> DATALOSS

Reply via email to