Re: Zookeeper session lost

Patrick Hunt Wed, 31 Mar 2010 08:24:16 -0700

Ok, well swapping, esp if combined with GC, can def. account for verylong delays.

Not sure if anyone provided this before but take a look at the swappingsection on the ZK troubleshooting page. That section, or perhaps one ofthe other sections on that page, might give you addl insight.

http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting


Good Luck,

Patrick

Peter Falk wrote:

We had 4GB head for the region server, on a machine with 8GB that wasalso running a data node and a zoo keeper. We have tried with theincremental garbage collector before, but had problem with a runningaway heap size, resulting in swapping. We were/are running withthe parallel GC now. When the session expire problem occurred, wenoticed swapping on the node just before. Therefore, we are a bit afraidto increase heap size more, or to try to incremental GC again. We arenot running in any virtualized environment.
Thanks for the various responses, and the recommendations. I think itwould be nice with an option to automatically restart region server forsituations like this.
TIA,
Peter
On Tue, Mar 30, 2010 at 18:25, Patrick Hunt <ph...@apache.org<mailto:ph...@apache.org>> wrote:
    Are you running in a virtualized environment by chance? (ec2,
    vmware, etc...) vms, esp oversubscribed/overloaded vms, can result
    in significant io/memory related performance problems.

    Patrick


    Peter Falk wrote:

        Thanks Jean-Daniel. I was not clear about what we have already
        tried, and we
        have tried all that you recommend in the updated wiki page,
        including uppin'
        the zookeepers session timeout. The node was heavily loaded at
        the time and
        it seems the cluster was simply overloaded.

        However, would it not be possible to automatically start the
        region server
        again and let it request new regions? Seems to be dangerous to
        let region
        servers die under heavy load like this, and increase the load
        further on
        remaining nodes...

        Sincerely,
        Peter

        On Mon, Mar 29, 2010 at 19:38, Jean-Daniel Cryans
        <jdcry...@apache.org <mailto:jdcry...@apache.org>>wrote:

            We already had an entry in the wiki for this issue but it
            wasn't super
            explicit about what's happening, so I completely rewrote it
            using the
            logs from this thread. See
            http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9

            Also I created a jira about putting that link directly into
            the "We
            slept Xms, ..." message so that people can get some answers
            quickly.
            See https://issues.apache.org/jira/browse/HBASE-2388

            J-D

Re: Zookeeper session lost

Reply via email to