On Sun, Dec 31, 2017 at 7:23 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Nothing bad that I can see. Here is a region server log:
> https://pastebin.com/0r76Y6ap
>
>
Good one JMS. This log has "nothing" about why we decide to close the
Region post successful open (If it was a Region w/ old hfiles or a native
compression lib to which we had no access, I'd have thought we'd have
failed the open before this point). I'm supposing its the Master is asking
it close. The log should make this more clear if this is what is going on
(HBASE-19701). Unfortunately the Master log is from a later period so
cannot correlate the RS-side opens/closes (Do you have the Master log from
around 2017-12-31 09:54:21,058 ?)

Looking at Master log, link posted below, it has trouble opening log #30.
It is finding incomplete edits. E.g:

2017-12-31 10:11:38,130 ERROR [node2:60000.masterManager]
procedure2.ProcedureExecutor: Corrupt pid=243, ppid=242,
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=email,
region=c07e50c3a15e8ab20cbd9514b333b67d, server=node4.com,16020,
1514693339685

We've seen this before (HBASE-18152). This is probably the root of the
strangeness we see here. I'd be interested in earlier logs JMS if you have
them sir. If Master is failing reading this last log it is going to be
working w/ an incomplete state. In particular, the regions at the point of
issue, were in OPENING state so when Master comes up, it is waiting on the
RS to report in a sucessful OPEN (of FAIL) but at this state in the game,
it is never going to happen it seems so we see....


2017-12-31 10:12:52,611 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.com,16020,1514693333206, table=work_proposed,
region=5b4b9a7b4e58da39a2072fdcb512df2f

...in Master log.

Do you have older logs that I can look at? A particular sequence of events
put us in this state. In the past, we've been able to determine where the
hole is and we've been able to plug it.

Maybe we could rerun your loading from the beginning but w/ DEBUG enabled
in case INFO-level does not reveal enough info?

Thanks JMS,
S




> Disabling the table makes the regions leave the transition mode. I'm trying
> to disable all tables one by one (because it get stuck after each disable)
> and will see if re-enabling them helps...
>
> On the master side, I now have errors all over:
> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> assignment.RegionTransitionProcedure: Retryable error trying to
> transition:
> pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
> UnassignProcedure table=work_proposed,
> region=d0a58b76ad9376b12b3e763660049d3d, server=node3.com,16020,1514693
> 337210;
> rit=OPENING, location=node3.com,16020,1514693337210
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
> current state=OPENING
> at
> org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
> nStateNode.transitionState(RegionStates.java:155)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> markRegionAsClosing(AssignmentManager.java:1530)
> at
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> updateTransition(UnassignProcedure.java:179)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:309)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:85)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
> cedure(ProcedureExecutor.java:1456)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
> Procedure(ProcedureExecutor.java:1225)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
> 800(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
> hread.run(ProcedureExecutor.java:1735)
>
> Non-stop showing on the logs. Probably because I disabled the table.
>




> Restarting HBase so see if it clears that a but...
>
> After restart there isn't any
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException on the logs.
> Only INFO lever. And nothing bad. But still, regions are stuck in
> transition even for the disabled tables.
>
> Master ls are here. I removed some sections because it always says the same
> thing, for each and every single region: https://pastebin.com/K6SQ7DXP
>
> JMS
>
> 2017-12-31 9:58 GMT-05:00 stack <saint....@gmail.com>:
>
> > There is nothing further up in the master log from regionservers or on
> > regionservers side on open?
> >
> > Thanks,
> > S
> >
> > On Dec 31, 2017 8:37 AM, "stack" <saint....@gmail.com> wrote:
> >
> > > Good questions.  If you disable snappy does it work?  If you start over
> > > fresh does it work?  It should be picking up native libs.  Make an
> issue
> > > please jms.  Thanks for giving it a go.
> > >
> > > S
> > >
> > > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <
> jean-m...@spaggiari.org
> > >
> > > wrote:
> > >
> > >> Hi Stack,
> > >>
> > >> I just tried to give it a try... Wipe out all HDFS content and code,
> all
> > >> HBase content and code, and all ZK. Re-build a brand new cluster with
> 7
> > >> physical worker nodes. I'm able to get HBase start, how-ever I'm not
> > able
> > >> to get my regions online.
> > >>
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node8.16020,1514693333206, table=pageMini,
> > >> region=a778eb67898dfd378e426f2e7700faea
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node6.16020,1514693336563, table=work_proposed,
> > >> region=4a1d86197ace3f4c8b1c8de28dbe1d34
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node1.16020,1514693336898, table=page_crc,
> > >> region=86b3912a09a5676b6851636ed22c2abc
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node7.16020,1514693337406, table=pageAvro,
> > >> region=391784c43c87bdea6df05f96accad0ff
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node8.16020,1514693333206, table=page,
> > >> region=5850d782a3beea18872769bf8fd70fc7
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node5.16020,1514693330961, table=work_proposed,
> > >> region=1d892c9b54b66f802b82c2f9fe847f1f
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node5.16020,1514693330961, table=pageAvro,
> > >> region=e9de2c68cc01883e959d7953a4251687
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node3.16020,1514693337210, table=page,
> > >> region=e2e5fc1c262273893f10e92f24817d1b
> > >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node3.16020,1514693337210, table=page,
> > >> region=89c443c09f10bd1584b1bb86a637e1a8
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node5.16020,1514693330961, table=page,
> > >> region=8ca93e9285233ca7b31992f194056bc1
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node4.16020,1514693339685, table=work_proposed,
> > >> region=9afcf06c4d0d21d7e04b0223edcfc40a
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node6.16020,1514693336563, table=page,
> > >> region=3457b3237c576eecd550eccee3f584cd
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node1.16020,1514693336898, table=page,
> > >> region=dd5fb1dbd41945a9ccbc110b8d4a51b5
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node7.16020,1514693337406, table=work_proposed,
> > >> region=480bb37af54d9fa57c727da9e8a33578
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node8.16020,1514693333206, table=page_crc,
> > >> region=56b18d470a569c5474ea084f0d995726
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node6.16020,1514693336563, table=page_duplicate,
> > >> region=e744a9af161de965c70c7d1a08b07660
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node1.16020,1514693336898, table=page_proposed,
> > >> region=1c75e53308acac6313db4be63c2b48fe
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node8.16020,1514693333206, table=work_proposed,
> > >> region=45a25ba85f6341a177db7b15554259f9
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node3.16020,1514693337210, table=work_proposed,
> > >> region=d0a58b76ad9376b12b3e763660049d3d
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node3.16020,1514693337210, table=page,
> > >> region=599a4b7b21b1d93fa232ebbbef37a31b
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node1.16020,1514693336898, table=page_proposed,
> > >> region=55c07269cc907b8e8875c2a1c4ec27d5
> > >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > >> assignment.AssignmentManager: TODO Handle stuck in transition:
> > >> rit=OPENING,
> > >> location=node5.,16020,1514693330961, table=page_crc,
> > >> region=fa3a3d7ebc64ce2a5494cae01477d8d8
> > >>
> > >> I'm 99% confident this is because of SNAPPY. I'm fighting to get it
> > >> working
> > >> but it's such a pain! My concern here is I don't see any exception
> > >> anywhere
> > >> on any logs. Nothing on the RS side, nothing on the master side
> (Except
> > >> extract above).
> > >>
> > >> I suspect it's snappy because of this:
> > >>
> > >> hbase@node2:~/hbase-2.0.0-beta-1$ bin/hbase
> > >> org.apache.hadoop.hbase.util.CompressionTest hdfs://node2/tmp/snappy
> > >> snappy
> > >> 2017-12-31 00:45:31,006 WARN  [main] util.NativeCodeLoader: Unable to
> > load
> > >> native-hadoop library for your platform... using builtin-java classes
> > >> where
> > >> applicable
> > >> 2017-12-31 00:45:33,283 INFO  [main] metrics.MetricRegistries: Loaded
> > >> MetricRegistries class
> > >> org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
> > >> 2017-12-31 00:45:33,366 INFO  [main] hfile.CacheConfig: Created
> > >> cacheConfig: CacheConfig:disabled
> > >> Exception in thread "main" java.lang.RuntimeException: native snappy
> > >> library not available: this version of libhadoop was built without
> > snappy
> > >> support.
> > >>         at
> > >> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoa
> > >> ded(SnappyCodec.java:65)
> > >>         at
> > >> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(
> > >> SnappyCodec.java:134)
> > >>         at
> > >> org.apache.hadoop.io.compress.CodecPool.getCompressor(
> > CodecPool.java:150)
> > >>         at
> > >> org.apache.hadoop.io.compress.CodecPool.getCompressor(
> > CodecPool.java:168)
> > >>         at
> > >> org.apache.hadoop.hbase.io.compress.Compression$Algorithm.
> > >> getCompressor(Compression.java:355)
> > >>         at
> > >> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodin
> > >> gContext.<init>(HFileBlockDefaultEncodingContext.java:90)
> > >>         at
> > >> org.apache.hadoop.hbase.io.hfile.NoOpDataBlockEncoder.newDat
> > >> aBlockEncodingContext(NoOpDataBlockEncoder.java:85)
> > >>         at
> > >> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.<init>(
> > >> HFileBlock.java:923)
> > >>         at
> > >> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishInit(
> > >> HFileWriterImpl.java:296)
> > >>         at
> > >> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.<init>(HFil
> > >> eWriterImpl.java:186)
> > >>         at
> > >> org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create(
> > >> HFile.java:339)
> > >>         at
> > >> org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(Com
> > >> pressionTest.java:129)
> > >>         at
> > >> org.apache.hadoop.hbase.util.CompressionTest.main(Compressio
> > >> nTest.java:167)
> > >>
> > >> But I think my installation is fine:
> > >> hbase@node2:~/hbase-2.0.0-beta-1$ ll native-build/
> > >> total 308
> > >> lrwxrwxrwx 1 hbase hbase     24 déc 31 00:29 libhadoopsnappy.so ->
> > >> libhadoopsnappy.so.0.0.1
> > >> lrwxrwxrwx 1 hbase hbase     24 déc 31 00:29 libhadoopsnappy.so.0 ->
> > >> libhadoopsnappy.so.0.0.1
> > >> -rwxr-xr-x 1 hbase hbase 120144 déc 31 00:29 libhadoopsnappy.so.0.0.1
> > >> lrwxrwxrwx 1 hbase hbase     18 déc  1  2012 libsnappy.so ->
> > >> libsnappy.so.1.1.3
> > >> lrwxrwxrwx 1 hbase hbase     18 déc  1  2012 libsnappy.so.1 ->
> > >> libsnappy.so.1.1.3
> > >> -rwxr-xr-x 1 hbase hbase 178210 déc  1  2012 libsnappy.so.1.1.3
> > >> drwxr-xr-x 3 hbase hbase   4096 déc 30 15:44 python2.6
> > >> drwxr-xr-x 4 hbase hbase   4096 déc 30 23:35 python2.7
> > >> drwxr-xr-x 3 hbase hbase   4096 déc 30 23:29 python3.5
> > >>
> > >> an in hbase-env.sh:
> > >> export JAVA_HOME=/usr/local/jdk1.8.0_151
> > >> export HBASE_LIBRARY_PATH=/home/hbase/hbase-2.0.0-beta-1/native-build
> > >>
> > >>
> > >> So there is 2 things here.
> > >> 1) Why are the region servers not reporting any error when they are
> not
> > >> able to open a region because of the compression codec not being
> loaded?
> > >> 2) Why is HBase not picking up the Snappy codec.
> > >>
> > >> Thanks,
> > >>
> > >> JMS
> > >>
> > >>
> > >> 2017-12-29 13:15 GMT-05:00 Stack <st...@duboce.net>:
> > >>
> > >> > The first release candidate for HBase 2.0.0-beta-1 is up at:
> > >> >
> > >> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-bet
> a-1-RC0/
> > >> >
> > >> > Maven artifacts are available from a staging directory here:
> > >> >
> > >> >  https://repository.apache.org/content/repositories/
> > orgapachehbase-1188
> > >> >
> > >> > All was signed with my key at 8ACC93D2 [1]
> > >> >
> > >> > I tagged the RC as 2.0.0-beta-1-RC0
> > >> > (0907563eb72697b394b8b960fe54887d6ff304fd)
> > >> >
> > >> > hbase-2.0.0-beta-1 is our first beta release. It includes all that
> was
> > >> in
> > >> > previous alphas (new assignment manager, offheap read/write path,
> > >> in-memory
> > >> > compactions, etc.). The APIs and feature-set are sealed.
> > >> >
> > >> > hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0.
> It
> > is
> > >> > meant for devs and downstreamers to test drive and flag us if we
> > messed
> > >> up
> > >> > on anything ahead of our rolling GAs. We are particular interested
> in
> > >> > hearing from Coprocessor developers.
> > >> >
> > >> > The list of features addressed in 2.0.0 so far can be found here
> [3].
> > >> There
> > >> > are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> > found
> > >> > here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if mistakes).
> > >> >
> > >> > I've updated our overview doc. on the state of 2.0.0 [6]. We'll do
> one
> > >> more
> > >> > beta before we put up our first 2.0.0 Release Candidate by the end
> of
> > >> > January, 2.0.0-beta-2. Its focus will be making it so users can do a
> > >> > rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes
> > found
> > >> > running beta-1). Here is the list of what we have targeted so far
> for
> > >> > beta-2 [5]. Check it out.
> > >> >
> > >> > One knownissue is that the User API has not been properly filtered
> so
> > it
> > >> > shows more than just InterfaceAudience Public content (HBASE-19663,
> to
> > >> be
> > >> > fixed by beta-2).
> > >> >
> > >> > Please take this beta for a spin. Please vote on whether it ok to
> put
> > >> out
> > >> > this RC as our first beta (Note CHANGES has not yet been updated).
> Let
> > >> the
> > >> > VOTE be open for 72 hours (Monday)
> > >> >
> > >> > Thanks,
> > >> > Your 2.0.0 Release Manager
> > >> >
> > >> > 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
> > >> > 3. https://goo.gl/scYjJr
> > >> > 4. https://goo.gl/dFFT8b
> > >> > 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> > >> > 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
> > >> > ktczrlKHK8N4SZzs/
> > >> >
> > >>
> > >
> >
>

Reply via email to