[
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877042#comment-15877042
]
Andrew Purtell commented on HBASE-17069:
----------------------------------------
This has not been completely resolved on branch-1
After running the generator for a short while with the slowDeterministic policy
in effect, we see one region has failed to open:
{noformat}
Number of regions in transition: 1
IntegrationTestBigLinkedList,,1487719286872.c0bed858d8298bcbc738104b0d5b4ca9.
state=FAILED_OPEN, ts=Tue Feb 21 23:43:38 UTC 2017 (97s ago), server=null
{noformat}
and it will never open
{noformat}
ERROR: Empty REGIONINFO_QUALIFIER found in hbase:meta
ERROR: Empty REGIONINFO_QUALIFIER found in hbase:meta
{noformat}
{noformat}
2017-02-21 23:51:02,037 WARN [ChaosMonkeyThread] hbase.MetaTableAccessor: No
serialized HRegionInfo in keyvalues=
{IntegrationTestBigLinkedList,,1487715081599.770a773dcce9ff052a1abf6e0bf5a3fe./info:seqnumDuringOpen/1487720322817/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,,1487715081599.770a773dcce9ff052a1abf6e0bf5a3fe./info:server/1487720322817/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,,1487715081599.770a773dcce9ff052a1abf6e0bf5a3fe./info:serverstartcode/1487720322817/Put/vlen=8/seqid=0}
2017-02-21 23:51:02,037 WARN [ChaosMonkeyThread] hbase.MetaTableAccessor: No
serialized HRegionInfo in keyvalues=
{IntegrationTestBigLinkedList,\x19\x99\x99\x99\x99\x99\x99\x99,1487717537972.5d4eb9af560399b58cedb03fb5dd2d8c./info:seqnumDuringOpen/1487720322846/Put/vlen=8/seqid=0,
IntegrationTestBigLinkedList,\x19\x99\x99\x99\x99\x99\x99\x99,1487717537972.5d4eb9af560399b58cedb03fb5dd2d8c./info:server/1487720322846/Put/vlen=20/seqid=0,
IntegrationTestBigLinkedList,\x19\x99\x99\x99\x99\x99\x99\x99,1487717537972.5d4eb9af560399b58cedb03fb5dd2d8c./info:serverstartcode/1487720322846/Put/vlen=8/seqid=0}
{noformat}
This is a failed merge, not split
{noformat}
2017-02-21 23:49:38,026 INFO [hbasefsck-pool1-t18] util.HBaseFsck: Region {
meta => null, hdfs => h
dfs://node-1.cluster/hbase/data/default/IntegrationTestBigLinkedList/5d4eb9af560399b58cedb03fb5dd2d8
c, deployed => , replicaId => 0 } got merge recently, its file(s) will be
cleaned by CatalogJanitor
later
2017-02-21 23:49:38,026 INFO [hbasefsck-pool1-t16] util.HBaseFsck: Region {
meta => null, hdfs => h
dfs://node-1.cluster/hbase/data/default/IntegrationTestBigLinkedList/86a164048874819c21ddd944916c0a2
c, deployed => , replicaId => 0 } got merge recently, its file(s) will be
cleaned by CatalogJanitor
later
ERROR: Region { meta =>
IntegrationTestBigLinkedList,,1487719286872.c0bed858d8298bcbc738104b0d5b4ca9
., hdfs =>
hdfs://node-1.cluster/hbase/data/default/IntegrationTestBigLinkedList/c0bed858d8298bcbc73
8104b0d5b4ca9, deployed => , replicaId => 0 } not deployed on any region server.
2017-02-21 23:49:38,026 INFO [hbasefsck-pool1-t13] util.HBaseFsck: Region {
meta => null, hdfs => h
dfs://node-1.cluster/hbase/data/default/IntegrationTestBigLinkedList/d93f65a230355b2e75cd109ad199492
d, deployed => , replicaId => 0 } got merge recently, its file(s) will be
cleaned by CatalogJanitor
later
2017-02-21 23:49:38,026 INFO [hbasefsck-pool1-t19] util.HBaseFsck: Region {
meta => null, hdfs => h
dfs://node-1.cluster/hbase/data/default/IntegrationTestBigLinkedList/770a773dcce9ff052a1abf6e0bf5a3fe,
deployed => , replicaId => 0 } got merge recently, its file(s) will be cleaned
by CatalogJanitor later
2017-02-21 23:49:38,033 INFO [main] util.HBaseFsck: Handling overlap merges in
parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
ERROR: (region
IntegrationTestBigLinkedList,fffffffd,1487717619706.25ffb1333783d4d01765146b42735f1b.)
First region should start with an empty key. You need to create a new region
and regioninfo in HDFS to plug the hole.
ERROR: Found inconsistency in table IntegrationTestBigLinkedList
{noformat}
[~abhishek.chouhan]
[~vik.karma]
> RegionServer writes invalid META entries for split daughters in some
> circumstances
> ----------------------------------------------------------------------------------
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.4
> Reporter: Andrew Purtell
> Assignee: Abhishek Singh Chouhan
> Priority: Blocker
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log,
> daughter_2_08629d59564726da2497f70451aafcdb.log,
> HBASE-17069.branch-1.3.001.patch, HBASE-17069.branch-1.3.002.patch,
> HBASE-17069.master.001.patch, logs.tar.gz,
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x.
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure:
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates:
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW,
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34,
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320,
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480,
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25},
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34,
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320,
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480,
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25},
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63,
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320,
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480,
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25},
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96,
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320,
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480,
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25},
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96,
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320,
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480,
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25},
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96,
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320,
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480,
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25},
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
> prefetchOnOpen=false
> {noformat}
> *The RegionServer onlines it*
> {noformat}
> 2016-11-08 04:23:27,015 INFO
> [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830]
> regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next
> sequenceid=19184
> 2016-11-08 04:23:27,029 INFO
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
> regionserver.HRegionServer: Post open deploy tasks for
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
> 2016-11-08 04:23:27,047 INFO
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
> hbase.MetaTableAccessor: Updated row
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
> with server=node-3.cluster,16020,1478578389506
> {noformat}
> *The Master transitions state from SPLITTING_NEW to OPEN*
> {noformat}
> 2016-11-08 04:23:27,058 INFO [AM.ZK.Worker-pool2-t84] master.RegionStates:
> Transition {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW,
> ts=1478579007057, server=node-3.cluster,16020,1478578389506} to
> {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058,
> server=node-3.cluster,16020,1478578389506}
> 2016-11-08 04:23:27,059 INFO [AM.ZK.Worker-pool2-t84]
> master.AssignmentManager: Handled SPLIT event;
> parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
> daughter
> a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
> daughter
> b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.,
> on node-3.cluster,16020,1478578389506
> {noformat}
> *RegionServer updates META - BUT APPARENTLY NOT CORRECTLY*
> {noformat}
> 2016-11-08 04:23:27,165 INFO
> [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099]
> regionserver.SplitRequest: Region split, hbase:meta updated, and report to
> master.
> Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2.,
> new regions:
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.,
>
> IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb..
> Split took 6sec
> {noformat}
> *RegionServer delays flush*
> (Is this important?)
> {noformat}
> 2016-11-08 04:24:14,639 WARN [MemStoreFlusher.0]
> regionserver.MemStoreFlusher: Region
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
> has too many store files; delaying flush up to 90000ms
> {noformat}
> *Immediate warnings about No serialized HRegionInfo*
> {noformat}
> 2016-11-08 04:24:44,691 WARN
> [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor:
> No serialized HRegionInfo in
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *Master is not happy either*
> {noformat}
> 2016-11-08 04:24:51,148 WARN [MASTER_TABLE_OPERATIONS-node-1:16000-0]
> hbase.MetaTableAccessor: No serialized HRegionInfo in
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
> {noformat}
> *TestRunner MetaScanner complains about invalid entries in META missing
> HRegionInfo*
> {noformat}
> (standard input):9086:2016-11-08 05:04:17,230 WARN
> [B.defaultRpcServer.handler=4,queue=1,port=16000] hbase.MetaTableAccessor: No
> serialized HRegionInfo in
> keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> *ITBLL MapReduce tasks fail because part of the keyspace cannot be located:*
> {noformat}
> java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList,
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359)
> at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238)
> at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154)
> at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431)
> at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException:
> HRegionInfo was null in IntegrationTestBigLinkedList,
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException:
> HRegionInfo was null in IntegrationTestBigLinkedList,
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException:
> HRegionInfo was null in IntegrationTestBigLinkedList,
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException:
> HRegionInfo was null in IntegrationTestBigLinkedList,
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
> {noformat}
> ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException:
> HRegionInfo was null in IntegrationTestBigLinkedList,
> row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0,
>
> IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)