Re: Very long time between node failure and reasing of regions.

Todd Lipcon Mon, 26 Apr 2010 14:17:58 -0700

2010/4/26 Michał Podsiadłowski <podsiadlow...@gmail.com>

> Hi Todd,
>
> Thanks for you input. Your words are making me sad though. I'm using
> 0.20.4 taken from trunk around beginning of April. Exact version I can
> tell you tomorrow.
> With respect to 1) we are only shutting down no even killing regions
> servers, datanodes are still working. This is not the first time we
> manage to break whole cluster with just shutting down regions servers.
>


Hi Michal,

I agree that this use case should not cause the cluster to fail. By "just
shutting down" do you mean you are running hbase-daemon.sh stop regionserver
on 3 of the nodes? Are you doing all three at once or in quick succession?
I'd like to try to reproduce your problem so we can get it fixed for 0.20.5.

Thanks
-Todd

>
>
> 2010/4/26 Todd Lipcon <t...@cloudera.com>:
> > Hi Michal,
> >
> > What version of HBase are you running?
> >
> > All currently released versions of HBase have known bugs with recovery
> under
> > crash scenarios, many of which have to do with the lack of a sync()
> feature
> > in released versions of HDFS.
> >
> > The goal for HBase 0.20.5, due out in the next couple of months, is to
> fix
> > all of these issues to achieve cluster stability under failure.
> >
> > I'm working full time on this branch, and happy to report that as of
> > yesterday I have a 40-threaded client which is inserting records into a
> > cluster where I am killing a region server once every 1-2 minutes, and it
> is
> > recovering completely and correctly through every failure. The test has
> been
> > running for about 24 hours, and no regions have been lost, etc.
> >
> > My next step is to start testing under 2-node failure scenarios, master
> > failure scenarios, etc.
> >
> > Regarding your specific questions:
> >
> > 1) When you have a simultaneous failure of 3 nodes, you will have blocks
> > become unavailable in the underlying HDFS. Thus, HBase has no recourse to
> be
> > able to continue operating correctly, since its data won't be accessible
> and
> > any edit logs writing to that set of 3 nodes will fail to append. Thus, I
> > don't think we can reasonably expect to do anything to recover from this
> > situation. We should shut down the cluster in such a way that, after HDFS
> > has been restored, we can restart HBase without missing regions, etc.
> There
> > are probably bugs here, currently, but is lower on the priority list
> > compared to more common scenarios.
> >
>
> > 2) When a region is being reassigned, it does take some time to recover.
> In
> > my experience, a loss of a region server hosting META does take about 2
> > minutes to fully reassign. The loss of a region server not holding META
> > takes about 1 minute to fully reassign. This is with a 1 minute ZK
> session
> > timeout. With shorter timeouts, you will detect failure faster, but more
> > likely to have false failure detections due to GC pauses, etc. We're
> working
> > on improving this for 0.21.
> >
> > Regarding the suitability of this for a real time workload, there are
> some
> > ideas floating around for future work that would make the regions
> available
> > very quickly in a readonly/stale data mode while the logs are split and
> > recovered. This is probably not going to happen in the short term, as it
> > will be tricky to do correctly, and there are more pressing issues.
> >
> > Thanks
> > -Todd
> >
> >
> >
> >
> >
> > 2010/4/26 Michał Podsiadłowski <podsiadlow...@gmail.com>
> >
> >>  Hi Edward,
> >>
> >> these are not good news for us. If under low load you get 30 seconds
> >> our 3 minutes are quite normal. Especially because your records are
> >> quite big and there is lots of removals and inserts. I just wonder if
> >> our use case scenarios are not in the sweet spot of hbase or hbase
> >> availability simply low. Do you have any knowledge about change to
> >> architecture in 0.21? As far as I can see partially problem is with
> >> dividing logs from dead data node to table files logs.
> >> Is there any way we could speed up recovery ? And can someone explain
> >> what happened when we shutdown 3/6 regions servers? Why cluster got
> >> into inconsistent state with so many missing regions? Is this so extra
> >> usual situation that hbase can't handle?
> >>
> >> Thanks,
> >> Michal
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Very long time between node failure and reasing of regions.

Reply via email to