It's a standard linking issue, you get one class from one version
another from another, they are mostly compatible in terms of
signatures (hence no exceptions) but are subtly incompatible in
different ways. In the stack trace you posted, the handlers were
blocked in:
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:382)
and the thread:
"regionserver60020.cacheFlusher" daemon prio=10 tid=0x00002aaabc21e000
nid=0x7717 waiting for monitor entry [0x0000000000000000]
java.lang.Thread.State: BLOCKED (on object monitor)
was idle.
The cache flusher thread should be flushing, and yet it's doing
nothing. This also happens to be one of the classes that were
changed.
On Thu, Feb 10, 2011 at 4:34 PM, Ted Yu <[email protected]> wrote:
> Can someone comment on my second question ?
> Thanks
>
> On Thu, Feb 10, 2011 at 4:25 PM, Ryan Rawson <[email protected]> wrote:
>
>> As I suspected.
>>
>> It's a byproduct of our maven assembly process. The process could be
>> fixed. I wouldn't mind. I don't support runtime checking of jars,
>> there is such thing as too much tests, and this is an example of it.
>> The check would then need a test, etc, etc.
>>
>> At SU we use new directories for each upgrade, copying the config
>> over. With the lack of -default.xml this is easier than ever (just
>> copy everything in conf/). With symlink switchover it makes roll
>> forward/back as simple as doing a symlink switchover or back. I have
>> to recommend this to everyone who doesnt have a management scheme.
>>
>> On Thu, Feb 10, 2011 at 4:20 PM, Ted Yu <[email protected]> wrote:
>> > hbase/hbase-0.90.1.jar leads lib/hbase-0.90.0.jar in the classpath.
>> > I wonder
>> > 1. why hbase jar is placed in two directories - 0.20.6 didn't use such
>> > structure
>> > 2. what from lib/hbase-0.90.0.jar could be picked up and why there wasn't
>> > exception in server log
>> >
>> > I think a JIRA should be filed for item 2 above - bail out when the two
>> > hbase jars from $HBASE_HOME and $HBASE_HOME/lib are of different
>> versions.
>> >
>> > Cheers
>> >
>> > On Thu, Feb 10, 2011 at 3:40 PM, Ryan Rawson <[email protected]> wrote:
>> >
>> >> What do you get when you:
>> >>
>> >> ls lib/hbase*
>> >>
>> >> I'm going to guess there is hbase-0.90.0.jar there
>> >>
>> >>
>> >>
>> >> On Thu, Feb 10, 2011 at 3:25 PM, Ted Yu <[email protected]> wrote:
>> >> > hbase-0.90.0-tests.jar and hbase-0.90.1.jar co-exist
>> >> > Would this be a problem ?
>> >> >
>> >> > On Thu, Feb 10, 2011 at 3:16 PM, Ryan Rawson <[email protected]>
>> wrote:
>> >> >
>> >> >> You don't have both the old and the new hbase jars in there do you?
>> >> >>
>> >> >> -ryan
>> >> >>
>> >> >> On Thu, Feb 10, 2011 at 3:12 PM, Ted Yu <[email protected]> wrote:
>> >> >> > .META. went offline during second flow attempt.
>> >> >> >
>> >> >> > The time out I mentioned happened for 1st and 3rd attempts. HBase
>> was
>> >> >> > restarted before the 1st and 3rd attempts.
>> >> >> >
>> >> >> > Here is jstack:
>> >> >> > http://pastebin.com/EHMSvsRt
>> >> >> >
>> >> >> > On Thu, Feb 10, 2011 at 3:04 PM, Stack <[email protected]> wrote:
>> >> >> >
>> >> >> >> So, .META. is not online? What happens if you use shell at this
>> >> time.
>> >> >> >>
>> >> >> >> Your attachement did not come across Ted. Mind postbin'ing it?
>> >> >> >>
>> >> >> >> St.Ack
>> >> >> >>
>> >> >> >> On Thu, Feb 10, 2011 at 2:41 PM, Ted Yu <[email protected]>
>> wrote:
>> >> >> >> > I replaced hbase jar with hbase-0.90.1.jar
>> >> >> >> > I also upgraded client side jar to hbase-0.90.1.jar
>> >> >> >> >
>> >> >> >> > Our map tasks were running faster than before for about 50
>> minutes.
>> >> >> >> However,
>> >> >> >> > map tasks then timed out calling flushCommits(). This happened
>> even
>> >> >> after
>> >> >> >> > fresh restart of hbase.
>> >> >> >> >
>> >> >> >> > I don't see any exception in region server logs.
>> >> >> >> >
>> >> >> >> > In master log, I found:
>> >> >> >> >
>> >> >> >> > 2011-02-10 18:24:15,286 DEBUG
>> >> >> >> > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
>> Opened
>> >> >> region
>> >> >> >> > -ROOT-,,0.70236052 on sjc1-hadoop6.X.com,60020,1297362251595
>> >> >> >> > 2011-02-10 18:24:15,349 INFO
>> >> >> >> org.apache.hadoop.hbase.catalog.CatalogTracker:
>> >> >> >> > Failed verification of .META.,,1 at address=null;
>> >> >> >> > org.apache.hadoop.hbase.NotServingRegionException:
>> >> >> >> > org.apache.hadoop.hbase.NotServingRegionException: Region is not
>> >> >> online:
>> >> >> >> > .META.,,1
>> >> >> >> > 2011-02-10 18:24:15,350 DEBUG
>> >> >> org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> >> >> >> > master:60000-0x12e10d0e31e0000 Creating (or updating) unassigned
>> >> node
>> >> >> for
>> >> >> >> > 1028785192 with OFFLINE state
>> >> >> >> >
>> >> >> >> > I am attaching region server (which didn't respond to
>> >> stop-hbase.sh)
>> >> >> >> jstack.
>> >> >> >> >
>> >> >> >> > FYI
>> >> >> >> >
>> >> >> >> > On Thu, Feb 10, 2011 at 10:10 AM, Stack <[email protected]>
>> wrote:
>> >> >> >> >>
>> >> >> >> >> Thats probably enough Ted. The 0.90.1 hbase-default.xml has an
>> >> extra
>> >> >> >> >> config. to enable the experimental HBASE-3455 feature but you
>> can
>> >> >> copy
>> >> >> >> >> that over if you want to try playing with it (it defaults off
>> so
>> >> >> you'd
>> >> >> >> >> copy over the config. if you wanted to set it to true).
>> >> >> >> >>
>> >> >> >> >> St.Ack
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>