[ https://issues.apache.org/jira/browse/HBASE-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612489#comment-16612489 ]
stack commented on HBASE-21165: ------------------------------- .001 is a simple log line that gives some sense of the amount of entries in WALs. Needs more logging for later stages of processing. In fact, let me commit this patch in a sub-issue and leave this open as an umbrella. > During ProcedureStore load, there is no listing of progress... > -------------------------------------------------------------- > > Key: HBASE-21165 > URL: https://issues.apache.org/jira/browse/HBASE-21165 > Project: HBase > Issue Type: Bug > Components: amv2, Operability > Reporter: stack > Assignee: stack > Priority: Major > Attachments: HBASE-21165.branch-2.1.001.patch > > > I have a Master that crashed on a large cluster with hundreds of outstanding > Procedure WALs and (probably --TBD) a few million Procedures to load. It is > taking a long time (two hours)... > There were STUCK procedures that were preventing clean-up of the old WALs. > I can tell we are making progress by enabling TRACE on the Procedure Store. > Better would be an emission as we made progress through the files with an > output after every so many procedures loaded. > Then, post-load, there is a long time spent sorting-out the Procedure > image... We are in finish for ages doing stuff like: > {code} > "master/vc0207:22001:becomeActiveMaster" #98 daemon prio=5 os_prio=0 > tid=0x0000000000d31800 nid=0x1efc0 runnable [0x00007f0a3c17d000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.removeFromMap(ProcedureWALFormatReader.java:837) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.fetchReady(ProcedureWALFormatReader.java:614) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:201) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:94) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:426) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:382) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:663) > at > org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1335) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:878) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2119) > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:567) > at > org.apache.hadoop.hbase.master.HMaster$$Lambda$42/1930759883.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > {code} > and.... > {code} > "master/vc0207:22001:becomeActiveMaster" #98 daemon prio=5 os_prio=0 > tid=0x0000000000d31800 nid=0x1efc0 runnable [0x00007f0a3c17d000] > java.lang.Thread.State: RUNNABLE > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at java.security.Provider$Service.newInstance(Provider.java:1595) > at sun.security.jca.GetInstance.getInstance(GetInstance.java:236) > at sun.security.jca.GetInstance.getInstance(GetInstance.java:164) > at java.security.Security.getImpl(Security.java:695) > at java.security.MessageDigest.getInstance(MessageDigest.java:167) > at org.apache.hadoop.hbase.util.MD5Hash.getMD5AsHex(MD5Hash.java:59) > at > org.apache.hadoop.hbase.client.RegionInfo.createRegionName(RegionInfo.java:560) > at > org.apache.hadoop.hbase.client.RegionInfo.createRegionName(RegionInfo.java:490) > at > org.apache.hadoop.hbase.client.RegionInfoBuilder$MutableRegionInfo.<init>(RegionInfoBuilder.java:243) > at > org.apache.hadoop.hbase.client.RegionInfoBuilder.build(RegionInfoBuilder.java:120) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3132) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.deserializeStateData(ServerCrashProcedure.java:335) > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:283) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$Entry.convert(ProcedureWALFormatReader.java:359) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$EntryIterator.next(ProcedureWALFormatReader.java:410) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:460) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:76) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:391) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:441) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:202) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:94) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:426) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:382) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:663) > at > org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1335) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:878) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2119) > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:567) > at > org.apache.hadoop.hbase.master.HMaster$$Lambda$42/1930759883.run(Unknown > Source) > {code} > In this case, it took 1.25 hours before we starting showing progress in the > log. Let me add progress output (Separately am looking at how to do speedup). -- This message was sent by Atlassian JIRA (v7.6.3#76005)