On Sun, Dec 31, 2017 at 7:23 AM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote:
> Nothing bad that I can see. Here is a region server log: > https://pastebin.com/0r76Y6ap > > Good one JMS. This log has "nothing" about why we decide to close the Region post successful open (If it was a Region w/ old hfiles or a native compression lib to which we had no access, I'd have thought we'd have failed the open before this point). I'm supposing its the Master is asking it close. The log should make this more clear if this is what is going on (HBASE-19701). Unfortunately the Master log is from a later period so cannot correlate the RS-side opens/closes (Do you have the Master log from around 2017-12-31 09:54:21,058 ?) Looking at Master log, link posted below, it has trouble opening log #30. It is finding incomplete edits. E.g: 2017-12-31 10:11:38,130 ERROR [node2:60000.masterManager] procedure2.ProcedureExecutor: Corrupt pid=243, ppid=242, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=email, region=c07e50c3a15e8ab20cbd9514b333b67d, server=node4.com,16020, 1514693339685 We've seen this before (HBASE-18152). This is probably the root of the strangeness we see here. I'd be interested in earlier logs JMS if you have them sir. If Master is failing reading this last log it is going to be working w/ an incomplete state. In particular, the regions at the point of issue, were in OPENING state so when Master comes up, it is waiting on the RS to report in a sucessful OPEN (of FAIL) but at this state in the game, it is never going to happen it seems so we see.... 2017-12-31 10:12:52,611 WARN [ProcExecTimeout] assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING, location=node8.com,16020,1514693333206, table=work_proposed, region=5b4b9a7b4e58da39a2072fdcb512df2f ...in Master log. Do you have older logs that I can look at? A particular sequence of events put us in this state. In the past, we've been able to determine where the hole is and we've been able to plug it. Maybe we could rerun your loading from the beginning but w/ DEBUG enabled in case INFO-level does not reveal enough info? Thanks JMS, S > Disabling the table makes the regions leave the transition mode. I'm trying > to disable all tables one by one (because it get stuck after each disable) > and will see if re-enabling them helps... > > On the master side, I now have errors all over: > 2017-12-31 10:06:26,511 WARN [ProcExecWrkr-89] > assignment.RegionTransitionProcedure: Retryable error trying to > transition: > pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH; > UnassignProcedure table=work_proposed, > region=d0a58b76ad9376b12b3e763660049d3d, server=node3.com,16020,1514693 > 337210; > rit=OPENING, location=node3.com,16020,1514693337210 > org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected > [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but > current state=OPENING > at > org.apache.hadoop.hbase.master.assignment.RegionStates$Regio > nStateNode.transitionState(RegionStates.java:155) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager. > markRegionAsClosing(AssignmentManager.java:1530) > at > org.apache.hadoop.hbase.master.assignment.UnassignProcedure. > updateTransition(UnassignProcedure.java:179) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionPr > ocedure.execute(RegionTransitionProcedure.java:309) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionPr > ocedure.execute(RegionTransitionProcedure.java:85) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro > cedure(ProcedureExecutor.java:1456) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute > Procedure(ProcedureExecutor.java:1225) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$ > 800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT > hread.run(ProcedureExecutor.java:1735) > > Non-stop showing on the logs. Probably because I disabled the table. > > Restarting HBase so see if it clears that a but... > > After restart there isn't any > org.apache.hadoop.hbase.exceptions.UnexpectedStateException on the logs. > Only INFO lever. And nothing bad. But still, regions are stuck in > transition even for the disabled tables. > > Master ls are here. I removed some sections because it always says the same > thing, for each and every single region: https://pastebin.com/K6SQ7DXP > > JMS > > 2017-12-31 9:58 GMT-05:00 stack <saint....@gmail.com>: > > > There is nothing further up in the master log from regionservers or on > > regionservers side on open? > > > > Thanks, > > S > > > > On Dec 31, 2017 8:37 AM, "stack" <saint....@gmail.com> wrote: > > > > > Good questions. If you disable snappy does it work? If you start over > > > fresh does it work? It should be picking up native libs. Make an > issue > > > please jms. Thanks for giving it a go. > > > > > > S > > > > > > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" < > jean-m...@spaggiari.org > > > > > > wrote: > > > > > >> Hi Stack, > > >> > > >> I just tried to give it a try... Wipe out all HDFS content and code, > all > > >> HBase content and code, and all ZK. Re-build a brand new cluster with > 7 > > >> physical worker nodes. I'm able to get HBase start, how-ever I'm not > > able > > >> to get my regions online. > > >> > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node8.16020,1514693333206, table=pageMini, > > >> region=a778eb67898dfd378e426f2e7700faea > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node6.16020,1514693336563, table=work_proposed, > > >> region=4a1d86197ace3f4c8b1c8de28dbe1d34 > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node1.16020,1514693336898, table=page_crc, > > >> region=86b3912a09a5676b6851636ed22c2abc > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node7.16020,1514693337406, table=pageAvro, > > >> region=391784c43c87bdea6df05f96accad0ff > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node8.16020,1514693333206, table=page, > > >> region=5850d782a3beea18872769bf8fd70fc7 > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node5.16020,1514693330961, table=work_proposed, > > >> region=1d892c9b54b66f802b82c2f9fe847f1f > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node5.16020,1514693330961, table=pageAvro, > > >> region=e9de2c68cc01883e959d7953a4251687 > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node3.16020,1514693337210, table=page, > > >> region=e2e5fc1c262273893f10e92f24817d1b > > >> 2017-12-31 00:42:03,187 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node3.16020,1514693337210, table=page, > > >> region=89c443c09f10bd1584b1bb86a637e1a8 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node5.16020,1514693330961, table=page, > > >> region=8ca93e9285233ca7b31992f194056bc1 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node4.16020,1514693339685, table=work_proposed, > > >> region=9afcf06c4d0d21d7e04b0223edcfc40a > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node6.16020,1514693336563, table=page, > > >> region=3457b3237c576eecd550eccee3f584cd > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node1.16020,1514693336898, table=page, > > >> region=dd5fb1dbd41945a9ccbc110b8d4a51b5 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node7.16020,1514693337406, table=work_proposed, > > >> region=480bb37af54d9fa57c727da9e8a33578 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node8.16020,1514693333206, table=page_crc, > > >> region=56b18d470a569c5474ea084f0d995726 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node6.16020,1514693336563, table=page_duplicate, > > >> region=e744a9af161de965c70c7d1a08b07660 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node1.16020,1514693336898, table=page_proposed, > > >> region=1c75e53308acac6313db4be63c2b48fe > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node8.16020,1514693333206, table=work_proposed, > > >> region=45a25ba85f6341a177db7b15554259f9 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node3.16020,1514693337210, table=work_proposed, > > >> region=d0a58b76ad9376b12b3e763660049d3d > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node3.16020,1514693337210, table=page, > > >> region=599a4b7b21b1d93fa232ebbbef37a31b > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node1.16020,1514693336898, table=page_proposed, > > >> region=55c07269cc907b8e8875c2a1c4ec27d5 > > >> 2017-12-31 00:42:03,188 WARN [ProcExecTimeout] > > >> assignment.AssignmentManager: TODO Handle stuck in transition: > > >> rit=OPENING, > > >> location=node5.,16020,1514693330961, table=page_crc, > > >> region=fa3a3d7ebc64ce2a5494cae01477d8d8 > > >> > > >> I'm 99% confident this is because of SNAPPY. I'm fighting to get it > > >> working > > >> but it's such a pain! My concern here is I don't see any exception > > >> anywhere > > >> on any logs. Nothing on the RS side, nothing on the master side > (Except > > >> extract above). > > >> > > >> I suspect it's snappy because of this: > > >> > > >> hbase@node2:~/hbase-2.0.0-beta-1$ bin/hbase > > >> org.apache.hadoop.hbase.util.CompressionTest hdfs://node2/tmp/snappy > > >> snappy > > >> 2017-12-31 00:45:31,006 WARN [main] util.NativeCodeLoader: Unable to > > load > > >> native-hadoop library for your platform... using builtin-java classes > > >> where > > >> applicable > > >> 2017-12-31 00:45:33,283 INFO [main] metrics.MetricRegistries: Loaded > > >> MetricRegistries class > > >> org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl > > >> 2017-12-31 00:45:33,366 INFO [main] hfile.CacheConfig: Created > > >> cacheConfig: CacheConfig:disabled > > >> Exception in thread "main" java.lang.RuntimeException: native snappy > > >> library not available: this version of libhadoop was built without > > snappy > > >> support. > > >> at > > >> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoa > > >> ded(SnappyCodec.java:65) > > >> at > > >> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType( > > >> SnappyCodec.java:134) > > >> at > > >> org.apache.hadoop.io.compress.CodecPool.getCompressor( > > CodecPool.java:150) > > >> at > > >> org.apache.hadoop.io.compress.CodecPool.getCompressor( > > CodecPool.java:168) > > >> at > > >> org.apache.hadoop.hbase.io.compress.Compression$Algorithm. > > >> getCompressor(Compression.java:355) > > >> at > > >> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodin > > >> gContext.<init>(HFileBlockDefaultEncodingContext.java:90) > > >> at > > >> org.apache.hadoop.hbase.io.hfile.NoOpDataBlockEncoder.newDat > > >> aBlockEncodingContext(NoOpDataBlockEncoder.java:85) > > >> at > > >> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.<init>( > > >> HFileBlock.java:923) > > >> at > > >> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishInit( > > >> HFileWriterImpl.java:296) > > >> at > > >> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.<init>(HFil > > >> eWriterImpl.java:186) > > >> at > > >> org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create( > > >> HFile.java:339) > > >> at > > >> org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(Com > > >> pressionTest.java:129) > > >> at > > >> org.apache.hadoop.hbase.util.CompressionTest.main(Compressio > > >> nTest.java:167) > > >> > > >> But I think my installation is fine: > > >> hbase@node2:~/hbase-2.0.0-beta-1$ ll native-build/ > > >> total 308 > > >> lrwxrwxrwx 1 hbase hbase 24 déc 31 00:29 libhadoopsnappy.so -> > > >> libhadoopsnappy.so.0.0.1 > > >> lrwxrwxrwx 1 hbase hbase 24 déc 31 00:29 libhadoopsnappy.so.0 -> > > >> libhadoopsnappy.so.0.0.1 > > >> -rwxr-xr-x 1 hbase hbase 120144 déc 31 00:29 libhadoopsnappy.so.0.0.1 > > >> lrwxrwxrwx 1 hbase hbase 18 déc 1 2012 libsnappy.so -> > > >> libsnappy.so.1.1.3 > > >> lrwxrwxrwx 1 hbase hbase 18 déc 1 2012 libsnappy.so.1 -> > > >> libsnappy.so.1.1.3 > > >> -rwxr-xr-x 1 hbase hbase 178210 déc 1 2012 libsnappy.so.1.1.3 > > >> drwxr-xr-x 3 hbase hbase 4096 déc 30 15:44 python2.6 > > >> drwxr-xr-x 4 hbase hbase 4096 déc 30 23:35 python2.7 > > >> drwxr-xr-x 3 hbase hbase 4096 déc 30 23:29 python3.5 > > >> > > >> an in hbase-env.sh: > > >> export JAVA_HOME=/usr/local/jdk1.8.0_151 > > >> export HBASE_LIBRARY_PATH=/home/hbase/hbase-2.0.0-beta-1/native-build > > >> > > >> > > >> So there is 2 things here. > > >> 1) Why are the region servers not reporting any error when they are > not > > >> able to open a region because of the compression codec not being > loaded? > > >> 2) Why is HBase not picking up the Snappy codec. > > >> > > >> Thanks, > > >> > > >> JMS > > >> > > >> > > >> 2017-12-29 13:15 GMT-05:00 Stack <st...@duboce.net>: > > >> > > >> > The first release candidate for HBase 2.0.0-beta-1 is up at: > > >> > > > >> > https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-bet > a-1-RC0/ > > >> > > > >> > Maven artifacts are available from a staging directory here: > > >> > > > >> > https://repository.apache.org/content/repositories/ > > orgapachehbase-1188 > > >> > > > >> > All was signed with my key at 8ACC93D2 [1] > > >> > > > >> > I tagged the RC as 2.0.0-beta-1-RC0 > > >> > (0907563eb72697b394b8b960fe54887d6ff304fd) > > >> > > > >> > hbase-2.0.0-beta-1 is our first beta release. It includes all that > was > > >> in > > >> > previous alphas (new assignment manager, offheap read/write path, > > >> in-memory > > >> > compactions, etc.). The APIs and feature-set are sealed. > > >> > > > >> > hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. > It > > is > > >> > meant for devs and downstreamers to test drive and flag us if we > > messed > > >> up > > >> > on anything ahead of our rolling GAs. We are particular interested > in > > >> > hearing from Coprocessor developers. > > >> > > > >> > The list of features addressed in 2.0.0 so far can be found here > [3]. > > >> There > > >> > are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be > > found > > >> > here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes). > > >> > > > >> > I've updated our overview doc. on the state of 2.0.0 [6]. We'll do > one > > >> more > > >> > beta before we put up our first 2.0.0 Release Candidate by the end > of > > >> > January, 2.0.0-beta-2. Its focus will be making it so users can do a > > >> > rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes > > found > > >> > running beta-1). Here is the list of what we have targeted so far > for > > >> > beta-2 [5]. Check it out. > > >> > > > >> > One knownissue is that the User API has not been properly filtered > so > > it > > >> > shows more than just InterfaceAudience Public content (HBASE-19663, > to > > >> be > > >> > fixed by beta-2). > > >> > > > >> > Please take this beta for a spin. Please vote on whether it ok to > put > > >> out > > >> > this RC as our first beta (Note CHANGES has not yet been updated). > Let > > >> the > > >> > VOTE be open for 72 hours (Monday) > > >> > > > >> > Thanks, > > >> > Your 2.0.0 Release Manager > > >> > > > >> > 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2 > > >> > 3. https://goo.gl/scYjJr > > >> > 4. https://goo.gl/dFFT8b > > >> > 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862 > > >> > 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_ > > >> > ktczrlKHK8N4SZzs/ > > >> > > > >> > > > > > >