[ 
https://issues.apache.org/jira/browse/HBASE-23589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lijin Bin resolved HBASE-23589.
-------------------------------
    Fix Version/s: 2.1.9
                   2.2.3
                   2.3.0
                   3.0.0
       Resolution: Fixed

> FlushDescriptor contains non-matching family/output combinations
> ----------------------------------------------------------------
>
>                 Key: HBASE-23589
>                 URL: https://issues.apache.org/jira/browse/HBASE-23589
>             Project: HBase
>          Issue Type: Bug
>          Components: read replicas
>    Affects Versions: 2.2.2
>            Reporter: Szabolcs Bukros
>            Assignee: Szabolcs Bukros
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0, 2.2.3, 2.1.9
>
>
> Flushing the active region creates the following files:
> {code:java}
> 2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c,
>  entries=49128, sequenceid
> =70688, filesize=41.4 M
> 2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a,
>  entries=50000, sequenceid
> =70688, filesize=42.3 M
> {code}
> On the read replica region when we try to replay the flush we see the 
> following:
> {code:java}
> 2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: 
> action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" 
> encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" 
> flush_sequence_number: 70688 store_flushes { family_name: "f2" 
> store_home_dir: "f2" flush_output: "ecc50f33085042f7bd2397253b896a3a" } 
> store_flushes { family_name: "f3" store_home_dir: "f3" flush_output: 
> "dab4d1cc01e44773bad7bdb5d2e33b6c" } region_name: 
> "IntegrationTestRegionReplicaReplication,cccccccc,1576252065847.20af2eb8929408f26d0b3b81e6b86d47."
>  doesn't exist any more. Skip loading the file(s)
> java.io.FileNotFoundException: HFileLink 
> locations=[hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
>  
> hdfs://replica-1:8020/hbase/.tmp/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
>  
> hdfs://replica-1:8020/hbase/mobdir/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
>  
> hdfs://replica-1:8020/hbase/archive/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a]
>         at 
> org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:415)
>         at 
> org.apache.hadoop.hbase.util.ServerRegionReplicaUtil.getStoreFileInfo(ServerRegionReplicaUtil.java:135)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFileInfo(HRegionFileSystem.java:311)
>         at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.replayFlush(HStore.java:2414)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayFlushInStores(HRegion.java:5310)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5184)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5018)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2229)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> As we can see the flush_outputs got mixed up. 
>  
> The issue is caused by HRegion.internalFlushCacheAndCommit. The code assumes 
> "{color:#808080}stores.values() and storeFlushCtxs have same order{color}" 
> which no longer seems to be true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to