[ https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529732#comment-16529732 ]
Zheng Hu commented on HBASE-20789: ---------------------------------- As [~Apache9] comment on RB, there's problem here in patch.v3: {code} 443 if (replaceExistingCacheBlock) { 444 ramCache.put(cacheKey, re); 445 } else if (ramCache.putIfAbsent(cacheKey, re) != null) { 446 return; 447 } {code} Can not just replace the cacheKey with new RAMQueueEntry, because the heapSize of bucket cache need to update if removing entry from ramCache. the WriterThread write to io-engine firstly, then sync, then remove the RAMQueueEntry from ramCache. It's possible that the removed entry is not the right one. {code} t1. thread0 try to cache block0 with key0 (BucketCache#cacheBlock) t2. replace it into ramCache; t3. writer thread write to io-engine; // t4. another thread1 try to cache block1 with same key0; (BucketCache#cacheBlock) // t5. replace block0 with block1 in ramCache t5. remove the entry (block1) with key0 from ramCache; {code} Finally,the thread0 will remove the incorrect block1... the heap size is wrong also.. So for safety, we still keep the putIfAbsent() to ensure that only one thread will remove entry from ramCache... the flaky ut has been fixed by waiting until the cache flushed to io-engine... > TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky > --------------------------------------------------------------- > > Key: HBASE-20789 > URL: https://issues.apache.org/jira/browse/HBASE-20789 > Project: HBase > Issue Type: Bug > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2 > > Attachments: > 0001-HBASE-20789-TestBucketCache-testCacheBlockNextBlockM.patch, > HBASE-20789.v1.patch, HBASE-20789.v2.patch, HBASE-20789.v3.patch, > bucket-33718.out > > > The UT failed frequently in our internal branch-2... Will dig into the UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)