[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879085#comment-15879085
 ] 

stack commented on HBASE-17069:
-------------------------------

Thanks [~abhishek.chouhan] I am enjoying reading your debug journeys. A 
miscounting of sequenceid

bq. The mutations are not empty, however we still were going the route of 
appending with inMemstore as false.

Am I reading in the wrong place, because seems like mutations have to be zero 
to go the 'false' route.

Its called PONR because along w/ the writes to hbase:meta, we send a prayer 
hoping we get to the next cleanup step (The new AM should help here).

bq. Now during the HRegionServer.postOpenDeployTasks we use 
MetaTableAccessor.updateRegionLocation() which doesn't have the hregioninfo

I wonder why the hregioninfo previously written doesn't 'shine through'... is 
the update of location writing null into the hregioninfo cell or is the read 
not allowing old versions of the hregioninfo cell in hbase:meta?

bq. I think we should handle mergingnew the same way as splittingnew. 

Yes. The states are myriad. There will be holes and takes someone of your 
diligence to identify them.



> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-17069
>                 URL: https://issues.apache.org/jira/browse/HBASE-17069
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.4
>            Reporter: Andrew Purtell
>            Assignee: Abhishek Singh Chouhan
>            Priority: Blocker
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
>         Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, 
> HBASE-17069.branch-1.3.001.patch, HBASE-17069.branch-1.3.002.patch, 
> HBASE-17069.master.001.patch, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> {noformat}
> *The RegionServer onlines it*
> {noformat}
> 2016-11-08 04:23:27,015 INFO  
> [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830]
>  regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next 
> sequenceid=19184
> 2016-11-08 04:23:27,029 INFO  
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] 
> regionserver.HRegionServer: Post open deploy tasks for 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
> 2016-11-08 04:23:27,047 INFO  
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] 
> hbase.MetaTableAccessor: Updated row 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. 
> with server=node-3.cluster,16020,1478578389506
> {noformat}
> *The Master transitions state from SPLITTING_NEW to OPEN*
> {noformat}
> 2016-11-08 04:23:27,058 INFO  [AM.ZK.Worker-pool2-t84] master.RegionStates: 
> Transition {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579007057, server=node-3.cluster,16020,1478578389506} to 
> {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058, 
> server=node-3.cluster,16020,1478578389506}
> 2016-11-08 04:23:27,059 INFO  [AM.ZK.Worker-pool2-t84] 
> master.AssignmentManager: Handled SPLIT event; 
> parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
>  daughter 
> a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
>  daughter 
> b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.,
>  on node-3.cluster,16020,1478578389506
> {noformat}
> *RegionServer updates META  - BUT APPARENTLY NOT CORRECTLY*
> {noformat}
> 2016-11-08 04:23:27,165 INFO  
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] 
> regionserver.SplitRequest: Region split, hbase:meta updated, and report to 
> master. 
> Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
>  new regions: 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
>  
> IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb..
>  Split took 6sec
> {noformat}
> *RegionServer delays flush*
> (Is this important?)
> {noformat}
> 2016-11-08 04:24:14,639 WARN  [MemStoreFlusher.0] 
> regionserver.MemStoreFlusher: Region 
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. 
> has too many store files; delaying flush up to 90000ms
> {noformat}
> *Immediate warnings about No serialized HRegionInfo*
> {noformat}
> 2016-11-08 04:24:44,691 WARN  
> [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor: 
> No serialized HRegionInfo in 
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *Master is not happy either*
> {noformat}
> 2016-11-08 04:24:51,148 WARN  [MASTER_TABLE_OPERATIONS-node-1:16000-0] 
> hbase.MetaTableAccessor: No serialized HRegionInfo in 
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *TestRunner MetaScanner complains about invalid entries in META missing 
> HRegionInfo*
> {noformat}
> (standard input):9086:2016-11-08 05:04:17,230 WARN  
> [B.defaultRpcServer.handler=4,queue=1,port=16000] hbase.MetaTableAccessor: No 
> serialized HRegionInfo in 
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> *ITBLL MapReduce tasks fail because part of the keyspace cannot be located:*
> {noformat}
> java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154)
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121)
>         at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486)
>         at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431)
>         at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException:
>  HRegionInfo was null in IntegrationTestBigLinkedList, 
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>  
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to