[ https://issues.apache.org/jira/browse/HBASE-26384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Kyle Purtell reopened HBASE-26384: ----------------------------------------- Recent branch-2.4 is showing dataloss in an integration test scenario with ADAPTIVE compacting memstore active. > Segment already flushed to hfile may still be remained in CompactingMemStore > ----------------------------------------------------------------------------- > > Key: HBASE-26384 > URL: https://issues.apache.org/jira/browse/HBASE-26384 > Project: HBase > Issue Type: Bug > Components: in-memory-compaction > Affects Versions: 3.0.0-alpha-1, 2.4.8 > Reporter: chenglei > Assignee: chenglei > Priority: Major > Labels: branch-2 > Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.9 > > > When {{CompactingMemStore}} prepares flushing, > {{CompactingMemStore.snapshot}} invokes following > {{CompactingMemStore.pushPipelineToSnapshot}} method to get {{Snapshot}}, > following line 570 and line 575 uses {{CompactionPipeline#version}} to track > whether the Segments in {{CompactionPipeline#pipeline}} has changed since it > gets {{VersionedSegmentsList}} in line 570 before emptying > {{CompactionPipeline#pipeline}} in line 575. > {code:java} > 565 private void pushPipelineToSnapshot() { > 566 int iterationsCnt = 0; > 567 boolean done = false; > 568 while (!done) { > 569 iterationsCnt++; > 570 VersionedSegmentsList segments = > pipeline.getVersionedList(); > 571 pushToSnapshot(segments.getStoreSegments()); > 572 // swap can return false in case the pipeline was updated > by ongoing compaction > 573 // and the version increase, the chance of it happenning is > very low > 574 // In Swap: don't close segments (they are in snapshot now) > and don't update the region size > 575 done = pipeline.swap(segments, null, false, false); > ....... > } > {code} > However, when {{CompactingMemStore#inMemoryCompaction}} executes > {{CompactionPipeline#flattenOneSegment}}, it does not change > {{CompactionPipeline#version}} , if there is an {{in memeory compaction}} > which executes {{CompactingMemStore#flattenOneSegment}} between above line > 570 and line 575, the {{CompactionPipeline#version}} not change, but the > {{Segment}} in {{CompactionPipeline}} has changed. Because > {{CompactionPipeline#version}} not change, {{pipeline.swap}} in above line > 575 could think it is safe to invoke following > {{CompactionPipeline#swapSuffix}} method to remove {{Segment}} in > {{CompactionPipeline}} , but the {{Segment}} in {{CompactionPipeline}} has > changed because of {{CompactingMemStore#flattenOneSegment}} , so the > {{Segment}} not removed in following line 295 and still remaining in > {{CompactionPipeline}}. > {code:java} > 293 private void swapSuffix(List<? extends Segment> suffix, > ImmutableSegment segment, > 294 boolean closeSegmentsInSuffix) { > 295 pipeline.removeAll(suffix); > 296 if(segment != null) pipeline.addLast(segment); > .... > {code} > However {{CompactingMemStore.snapshot}} think it is successful and continues > to flush the {{Segment}} got by {{CompactingMemStore.snapshot}} as normal, > but the {{Segment}} with the same cells still be left in > {{CompactingMemStore}}. Leaving {{Segment}} which already flushed in > {{MemStore}} is dangerous: if a Major Compaction before the left {{Segment}} > flushing, there may be data erroneous. > My Fix in the PR is as following: > # Increasing the {{CompactionPipeline#version}} in > {{CompactingMemStore#flattenOneSegment}} . > Branch-2 has this problem but master not, because the branch-2 patch for > HBASE-18375 omitting this. > # For {{CompactionPipeline#swapSuffix}} , explicitly checking that the > {{Segment}} in {{suffix}} input parameter is same as the {{Segment}} in > {{pipeline}} one by one from > the last element to the first element of {{suffix}} , I think explicitly > throwing Exception is better than hiding error and causing subtle problem. > I made separate PRs for master and branch-2 so the code for master and > brach-2 could consistent and master could also has UTs for this problem. > [PR#3777|https://github.com/apache/hbase/pull/3777] is for master and > [PR#3779|https://github.com/apache/hbase/pull/3779] is for branch-2.The > difference between them is patch for brach-2 including following code in > {{CompactionPipeline.replaceAtIndex}} which not included in branch-2 patch > for HBASE-18375: > {code:java} > // the version increment is indeed needed, because the swap uses > removeAll() method of the > // linked-list that compares the objects to find what to remove. > // The flattening changes the segment object completely (creation > pattern) and so > // swap will not proceed correctly after concurrent flattening. > version++; > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)