This is great stuff jms.  Thank you.  Away from computer at mo but will dig
in.

Is it possible old files left over written with old hbase with old hfile
version? Can you see on source?  They should have but updated by a
compaction if a long time idle, I agree.

Yeah. If region assign fails, and goes into assignable state, we need
intervention. We've been shutting down all the ways in which this could
happen but you seem to have stumbled on a new one. I will take a look at
your logs.

What you going to vote?  Does it basically work?

Thanks again for the try out.
S

On Dec 31, 2017 12:43 PM, "Jean-Marc Spaggiari" <jean-m...@spaggiari.org>
wrote:

Sorry to spam the list :(

Another interesting thing.

Now most of my tablesare online. For few I'm getting this:
Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
major=2, minor=1: expected at least major=2 and minor=3
        at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(
HFileReaderImpl.java:332)
        at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.<init>(
HFileReaderImpl.java:199)
        at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
        ... 13 more

What is interesting is tat I'm not doing anything on the source cluster for
weeks/months. So all tables are all major compacted the same way. I will
major compact them all under HFiles v3 format and retry.

2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Ok. With a brand new DestCP from source cluster, regions are getting
> assigned correctly. So sound like if they get stuck initially for any
> reason, then even if the reason is fixed they can not get assigned anymore
> again. Will keep playing.
>
> I kept the previous /hbase just in case we need something from it.
>
> Thanks,
>
> JMS
>
> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
>
>> Nothing bad that I can see. Here is a region server log:
>> https://pastebin.com/0r76Y6ap
>>
>> Disabling the table makes the regions leave the transition mode. I'm
>> trying to disable all tables one by one (because it get stuck after each
>> disable) and will see if re-enabling them helps...
>>
>> On the master side, I now have errors all over:
>> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
>> assignment.RegionTransitionProcedure: Retryable error trying to
>> transition: pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
>> UnassignProcedure table=work_proposed, region=
d0a58b76ad9376b12b3e763660049d3d,
>> server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
>> ,16020,1514693337210
>> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
>> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
>> current state=OPENING
>> at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
>> nStateNode.transitionState(RegionStates.java:155)
>> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
>> markRegionAsClosing(AssignmentManager.java:1530)
>> at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
>> updateTransition(UnassignProcedure.java:179)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:309)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:85)
>> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
>> dure.java:845)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
>> cedure(ProcedureExecutor.java:1456)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
>> Procedure(ProcedureExecutor.java:1225)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
>> 800(ProcedureExecutor.java:78)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
>> hread.run(ProcedureExecutor.java:1735)
>>
>> Non-stop showing on the logs. Probably because I disabled the table.
>> Restarting HBase so see if it clears that a but...
>>
>> After restart there isn't any org.apache.hadoop.hbase.except
>> ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
>> bad. But still, regions are stuck in transition even for the disabled
>> tables.
>>
>> Master ls are here. I removed some sections because it always says the
>> same thing, for each and every single region: https://pastebin.com/K
>> 6SQ7DXP
>>
>> JMS
>>
>> 2017-12-31 9:58 GMT-05:00 stack <saint....@gmail.com>:
>>
>>> There is nothing further up in the master log from regionservers or on
>>> regionservers side on open?
>>>
>>> Thanks,
>>> S
>>>
>>> On Dec 31, 2017 8:37 AM, "stack" <saint....@gmail.com> wrote:
>>>
>>> > Good questions.  If you disable snappy does it work?  If you start
over
>>> > fresh does it work?  It should be picking up native libs.  Make an
>>> issue
>>> > please jms.  Thanks for giving it a go.
>>> >
>>> > S
>>> >
>>> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <
>>> jean-m...@spaggiari.org>
>>> > wrote:
>>> >
>>> >> Hi Stack,
>>> >>
>>> >> I just tried to give it a try... Wipe out all HDFS content and code,
>>> all
>>> >> HBase content and code, and all ZK. Re-build a brand new cluster with
>>> 7
>>> >> physical worker nodes. I'm able to get HBase start, how-ever I'm not
>>> able
>>> >> to get my regions online.
>>> >>
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node8.16020,1514693333206, table=pageMini,
>>> >> region=a778eb67898dfd378e426f2e7700faea
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node6.16020,1514693336563, table=work_proposed,
>>> >> region=4a1d86197ace3f4c8b1c8de28dbe1d34
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node1.16020,1514693336898, table=page_crc,
>>> >> region=86b3912a09a5676b6851636ed22c2abc
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node7.16020,1514693337406, table=pageAvro,
>>> >> region=391784c43c87bdea6df05f96accad0ff
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node8.16020,1514693333206, table=page,
>>> >> region=5850d782a3beea18872769bf8fd70fc7
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node5.16020,1514693330961, table=work_proposed,
>>> >> region=1d892c9b54b66f802b82c2f9fe847f1f
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node5.16020,1514693330961, table=pageAvro,
>>> >> region=e9de2c68cc01883e959d7953a4251687
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node3.16020,1514693337210, table=page,
>>> >> region=e2e5fc1c262273893f10e92f24817d1b
>>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node3.16020,1514693337210, table=page,
>>> >> region=89c443c09f10bd1584b1bb86a637e1a8
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node5.16020,1514693330961, table=page,
>>> >> region=8ca93e9285233ca7b31992f194056bc1
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node4.16020,1514693339685, table=work_proposed,
>>> >> region=9afcf06c4d0d21d7e04b0223edcfc40a
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node6.16020,1514693336563, table=page,
>>> >> region=3457b3237c576eecd550eccee3f584cd
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node1.16020,1514693336898, table=page,
>>> >> region=dd5fb1dbd41945a9ccbc110b8d4a51b5
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node7.16020,1514693337406, table=work_proposed,
>>> >> region=480bb37af54d9fa57c727da9e8a33578
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node8.16020,1514693333206, table=page_crc,
>>> >> region=56b18d470a569c5474ea084f0d995726
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node6.16020,1514693336563, table=page_duplicate,
>>> >> region=e744a9af161de965c70c7d1a08b07660
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node1.16020,1514693336898, table=page_proposed,
>>> >> region=1c75e53308acac6313db4be63c2b48fe
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node8.16020,1514693333206, table=work_proposed,
>>> >> region=45a25ba85f6341a177db7b15554259f9
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node3.16020,1514693337210, table=work_proposed,
>>> >> region=d0a58b76ad9376b12b3e763660049d3d
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node3.16020,1514693337210, table=page,
>>> >> region=599a4b7b21b1d93fa232ebbbef37a31b
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node1.16020,1514693336898, table=page_proposed,
>>> >> region=55c07269cc907b8e8875c2a1c4ec27d5
>>> >> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
>>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>>> >> rit=OPENING,
>>> >> location=node5.,16020,1514693330961, table=page_crc,
>>> >> region=fa3a3d7ebc64ce2a5494cae01477d8d8
>>> >>
>>> >> I'm 99% confident this is because of SNAPPY. I'm fighting to get it
>>> >> working
>>> >> but it's such a pain! My concern here is I don't see any exception
>>> >> anywhere
>>> >> on any logs. Nothing on the RS side, nothing on the master side
>>> (Except
>>> >> extract above).
>>> >>
>>> >> I suspect it's snappy because of this:
>>> >>
>>> >> hbase@node2:~/hbase-2.0.0-beta-1$ bin/hbase
>>> >> org.apache.hadoop.hbase.util.CompressionTest hdfs://node2/tmp/snappy
>>> >> snappy
>>> >> 2017-12-31 00:45:31,006 WARN  [main] util.NativeCodeLoader: Unable to
>>> load
>>> >> native-hadoop library for your platform... using builtin-java classes
>>> >> where
>>> >> applicable
>>> >> 2017-12-31 00:45:33,283 INFO  [main] metrics.MetricRegistries: Loaded
>>> >> MetricRegistries class
>>> >> org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
>>> >> 2017-12-31 00:45:33,366 INFO  [main] hfile.CacheConfig: Created
>>> >> cacheConfig: CacheConfig:disabled
>>> >> Exception in thread "main" java.lang.RuntimeException: native snappy
>>> >> library not available: this version of libhadoop was built without
>>> snappy
>>> >> support.
>>> >>         at
>>> >> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoa
>>> >> ded(SnappyCodec.java:65)
>>> >>         at
>>> >> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(
>>> >> SnappyCodec.java:134)
>>> >>         at
>>> >> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecP
>>> ool.java:150)
>>> >>         at
>>> >> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecP
>>> ool.java:168)
>>> >>         at
>>> >> org.apache.hadoop.hbase.io.compress.Compression$Algorithm.
>>> >> getCompressor(Compression.java:355)
>>> >>         at
>>> >> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodin
>>> >> gContext.<init>(HFileBlockDefaultEncodingContext.java:90)
>>> >>         at
>>> >> org.apache.hadoop.hbase.io.hfile.NoOpDataBlockEncoder.newDat
>>> >> aBlockEncodingContext(NoOpDataBlockEncoder.java:85)
>>> >>         at
>>> >> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.<init>(
>>> >> HFileBlock.java:923)
>>> >>         at
>>> >> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishInit(
>>> >> HFileWriterImpl.java:296)
>>> >>         at
>>> >> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.<init>(HFil
>>> >> eWriterImpl.java:186)
>>> >>         at
>>> >> org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create(
>>> >> HFile.java:339)
>>> >>         at
>>> >> org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(Com
>>> >> pressionTest.java:129)
>>> >>         at
>>> >> org.apache.hadoop.hbase.util.CompressionTest.main(Compressio
>>> >> nTest.java:167)
>>> >>
>>> >> But I think my installation is fine:
>>> >> hbase@node2:~/hbase-2.0.0-beta-1$ ll native-build/
>>> >> total 308
>>> >> lrwxrwxrwx 1 hbase hbase     24 déc 31 00:29 libhadoopsnappy.so ->
>>> >> libhadoopsnappy.so.0.0.1
>>> >> lrwxrwxrwx 1 hbase hbase     24 déc 31 00:29 libhadoopsnappy.so.0 ->
>>> >> libhadoopsnappy.so.0.0.1
>>> >> -rwxr-xr-x 1 hbase hbase 120144 déc 31 00:29 libhadoopsnappy.so.0.0.1
>>> >> lrwxrwxrwx 1 hbase hbase     18 déc  1  2012 libsnappy.so ->
>>> >> libsnappy.so.1.1.3
>>> >> lrwxrwxrwx 1 hbase hbase     18 déc  1  2012 libsnappy.so.1 ->
>>> >> libsnappy.so.1.1.3
>>> >> -rwxr-xr-x 1 hbase hbase 178210 déc  1  2012 libsnappy.so.1.1.3
>>> >> drwxr-xr-x 3 hbase hbase   4096 déc 30 15:44 python2.6
>>> >> drwxr-xr-x 4 hbase hbase   4096 déc 30 23:35 python2.7
>>> >> drwxr-xr-x 3 hbase hbase   4096 déc 30 23:29 python3.5
>>> >>
>>> >> an in hbase-env.sh:
>>> >> export JAVA_HOME=/usr/local/jdk1.8.0_151
>>> >> export HBASE_LIBRARY_PATH=/home/hbase/hbase-2.0.0-beta-1/native-build
>>> >>
>>> >>
>>> >> So there is 2 things here.
>>> >> 1) Why are the region servers not reporting any error when they are
>>> not
>>> >> able to open a region because of the compression codec not being
>>> loaded?
>>> >> 2) Why is HBase not picking up the Snappy codec.
>>> >>
>>> >> Thanks,
>>> >>
>>> >> JMS
>>> >>
>>> >>
>>> >> 2017-12-29 13:15 GMT-05:00 Stack <st...@duboce.net>:
>>> >>
>>> >> > The first release candidate for HBase 2.0.0-beta-1 is up at:
>>> >> >
>>> >> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-bet
>>> a-1-RC0/
>>> >> >
>>> >> > Maven artifacts are available from a staging directory here:
>>> >> >
>>> >> >  https://repository.apache.org/content/repositories/orgapache
>>> hbase-1188
>>> >> >
>>> >> > All was signed with my key at 8ACC93D2 [1]
>>> >> >
>>> >> > I tagged the RC as 2.0.0-beta-1-RC0
>>> >> > (0907563eb72697b394b8b960fe54887d6ff304fd)
>>> >> >
>>> >> > hbase-2.0.0-beta-1 is our first beta release. It includes all that
>>> was
>>> >> in
>>> >> > previous alphas (new assignment manager, offheap read/write path,
>>> >> in-memory
>>> >> > compactions, etc.). The APIs and feature-set are sealed.
>>> >> >
>>> >> > hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0.
>>> It is
>>> >> > meant for devs and downstreamers to test drive and flag us if we
>>> messed
>>> >> up
>>> >> > on anything ahead of our rolling GAs. We are particular interested
>>> in
>>> >> > hearing from Coprocessor developers.
>>> >> >
>>> >> > The list of features addressed in 2.0.0 so far can be found here
>>> [3].
>>> >> There
>>> >> > are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
>>> found
>>> >> > here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if
mistakes).
>>> >> >
>>> >> > I've updated our overview doc. on the state of 2.0.0 [6]. We'll do
>>> one
>>> >> more
>>> >> > beta before we put up our first 2.0.0 Release Candidate by the end
>>> of
>>> >> > January, 2.0.0-beta-2. Its focus will be making it so users can do
a
>>> >> > rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes
>>> found
>>> >> > running beta-1). Here is the list of what we have targeted so far
>>> for
>>> >> > beta-2 [5]. Check it out.
>>> >> >
>>> >> > One knownissue is that the User API has not been properly filtered
>>> so it
>>> >> > shows more than just InterfaceAudience Public content (HBASE-19663,
>>> to
>>> >> be
>>> >> > fixed by beta-2).
>>> >> >
>>> >> > Please take this beta for a spin. Please vote on whether it ok to
>>> put
>>> >> out
>>> >> > this RC as our first beta (Note CHANGES has not yet been updated).
>>> Let
>>> >> the
>>> >> > VOTE be open for 72 hours (Monday)
>>> >> >
>>> >> > Thanks,
>>> >> > Your 2.0.0 Release Manager
>>> >> >
>>> >> > 1. http://pgp.mit.edu/pks/lookup?op=get&search=0x9816C7FC8ACC93D2
>>> >> > 3. https://goo.gl/scYjJr
>>> >> > 4. https://goo.gl/dFFT8b
>>> >> > 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
>>> >> > 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
>>> >> > ktczrlKHK8N4SZzs/
>>> >> >
>>> >>
>>> >
>>>
>>
>>
>

Reply via email to