[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879421#comment-15879421
 ] 

stack commented on HBASE-17069:
-------------------------------

bq. From what i understood we enter the below block only when the walEdit is 
not empty and we have some cells in the wal edit from the processed mutations.

You are right. I was wrong (misreading). It was probably set to 'false' because 
we haven't added the append to memstore yet... that happens next... which seems 
reasonable.

bq.  In SequenceIdAccounting.update() we pass false for the multirequest (lets 
say sequence id here was 1) so lowestunflusedsequenceid is not updated.

The edit is in WAL, perhaps unsynced, but not in memstore (yet); a limbo. 
Another limbo is the gap while we wait for an edit to be stamped with a 
sequenceid (HBASE-17471 and friends help here)

bq. Now for the second put that goes through doMiniBatchMutation we pass true 
correctly during append(Seq id 2). lowestUnflushedSequenceIds is set to 2 for 
the metafamily. 

Issue is we have two ways of writing (smile) and they don't agree which is 
especially fun in times of high concurrency where behind the scenes we are 
trying to keep an incrementing counter that aligns with order in which edits 
arrive.

Setting true as the patch does seems like a low-risk way of keeping stuff 
aligned in branch-1.2/1.3. We should do a follow-on for branch-1 to correct the 
fuzzyness around inmemory flag and ensure that the rare edit that is not meant 
for memstore doesn't cause an issue similar to here because it has us skip out 
on sequenceid accounting.

Thanks [~abhishek.chouhan]

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-17069
>                 URL: https://issues.apache.org/jira/browse/HBASE-17069
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.4
>            Reporter: Andrew Purtell
>            Assignee: Abhishek Singh Chouhan
>            Priority: Blocker
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
>         Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, 
> HBASE-17069.branch-1.3.001.patch, HBASE-17069.branch-1.3.002.patch, 
> HBASE-17069.master.001.patch, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> {noformat}
> *The RegionServer onlines it*
> {noformat}
> 2016-11-08 04:23:27,015 INFO  
> [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830]
>  regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next 
> sequenceid=19184
> 2016-11-08 04:23:27,029 INFO  
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] 
> regionserver.HRegionServer: Post open deploy tasks for 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
> 2016-11-08 04:23:27,047 INFO  
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] 
> hbase.MetaTableAccessor: Updated row 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. 
> with server=node-3.cluster,16020,1478578389506
> {noformat}
> *The Master transitions state from SPLITTING_NEW to OPEN*
> {noformat}
> 2016-11-08 04:23:27,058 INFO  [AM.ZK.Worker-pool2-t84] master.RegionStates: 
> Transition {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579007057, server=node-3.cluster,16020,1478578389506} to 
> {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058, 
> server=node-3.cluster,16020,1478578389506}
> 2016-11-08 04:23:27,059 INFO  [AM.ZK.Worker-pool2-t84] 
> master.AssignmentManager: Handled SPLIT event; 
> parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
>  daughter 
> a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
>  daughter 
> b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.,
>  on node-3.cluster,16020,1478578389506
> {noformat}
> *RegionServer updates META  - BUT APPARENTLY NOT CORRECTLY*
> {noformat}
> 2016-11-08 04:23:27,165 INFO  
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] 
> regionserver.SplitRequest: Region split, hbase:meta updated, and report to 
> master. 
> Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
>  new regions: 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
>  
> IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb..
>  Split took 6sec
> {noformat}
> *RegionServer delays flush*
> (Is this important?)
> {noformat}
> 2016-11-08 04:24:14,639 WARN  [MemStoreFlusher.0] 
> regionserver.MemStoreFlusher: Region 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. 
> has too many store files; delaying flush up to 90000ms
> {noformat}
> *Immediate warnings about No serialized HRegionInfo*
> {noformat}
> 2016-11-08 04:24:44,691 WARN  
> [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor: 
> No serialized HRegionInfo in 
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *Master is not happy either*
> {noformat}
> 2016-11-08 04:24:51,148 WARN  [MASTER_TABLE_OPERATIONS-node-1:16000-0] 
> hbase.MetaTableAccessor: No serialized HRegionInfo in 
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *TestRunner MetaScanner complains about invalid entries in META missing 
> HRegionInfo*
> {noformat}
> (standard input):9086:2016-11-08 05:04:17,230 WARN  
> [B.defaultRpcServer.handler=4,queue=1,port=16000] hbase.MetaTableAccessor: No 
> serialized HRegionInfo in 
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> *ITBLL MapReduce tasks fail because part of the keyspace cannot be located:*
> {noformat}
> java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121)
>         at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486)
>         at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431)
>         at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to