[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569599#comment-14569599 ]
stack commented on HBASE-13811: ------------------------------- Confirmed the above thesis. When we start to flush, we remove current oldest sequence ids and park them in another data structure (See FSHLog#startCacheFlush). We then let appends go on while the flush is happening. Appends, if they find the oldest sequence ids empty, will add current id as the oldest sequence id (See FSHLog#updateOldestUnflushedSequenceIds pasted above). The report to the master always reads the oldests sequence ids data structure. It does not look to see if an ongoing flush going on. It used the below from FSHLog: {code} public long getEarliestMemstoreSeqNum(byte[] encodedRegionName, byte[] familyName) { ConcurrentMap<byte[], Long> oldestUnflushedStoreSequenceIdsOfRegion = this.oldestUnflushedStoreSequenceIds.get(encodedRegionName); if (oldestUnflushedStoreSequenceIdsOfRegion != null) { Long result = oldestUnflushedStoreSequenceIdsOfRegion.get(familyName); return result != null ? result.longValue() : HConstants.NO_SEQNUM; } else { return HConstants.NO_SEQNUM; } } {code} One fix would be to have the report to master consider ongoing flushes but let me see if can simplify this at all.... > Splitting WALs, we are filtering out too many edits -> DATALOSS > --------------------------------------------------------------- > > Key: HBASE-13811 > URL: https://issues.apache.org/jira/browse/HBASE-13811 > Project: HBase > Issue Type: Bug > Components: wal > Reporter: stack > Priority: Critical > > I've been running ITBLLs against branch-1 around HBASE-13616 (move of > ServerShutdownHandler to pv2). I have come across an instance of dataloss. My > patch for HBASE-13616 was in place so can only think it the cause (but cannot > see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)