We had 4GB head for the region server, on a machine with 8GB that was also running a data node and a zoo keeper. We have tried with the incremental garbage collector before, but had problem with a running away heap size, resulting in swapping. We were/are running with the parallel GC now. When the session expire problem occurred, we noticed swapping on the node just before. Therefore, we are a bit afraid to increase heap size more, or to try to incremental GC again. We are not running in any virtualized environment.
Thanks for the various responses, and the recommendations. I think it would be nice with an option to automatically restart region server for situations like this. TIA, Peter On Tue, Mar 30, 2010 at 18:25, Patrick Hunt <ph...@apache.org> wrote: > Are you running in a virtualized environment by chance? (ec2, vmware, > etc...) vms, esp oversubscribed/overloaded vms, can result in significant > io/memory related performance problems. > > Patrick > > > Peter Falk wrote: > >> Thanks Jean-Daniel. I was not clear about what we have already tried, and >> we >> have tried all that you recommend in the updated wiki page, including >> uppin' >> the zookeepers session timeout. The node was heavily loaded at the time >> and >> it seems the cluster was simply overloaded. >> >> However, would it not be possible to automatically start the region server >> again and let it request new regions? Seems to be dangerous to let region >> servers die under heavy load like this, and increase the load further on >> remaining nodes... >> >> Sincerely, >> Peter >> >> On Mon, Mar 29, 2010 at 19:38, Jean-Daniel Cryans <jdcry...@apache.org >> >wrote: >> >> We already had an entry in the wiki for this issue but it wasn't super >>> explicit about what's happening, so I completely rewrote it using the >>> logs from this thread. See >>> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9 >>> >>> Also I created a jira about putting that link directly into the "We >>> slept Xms, ..." message so that people can get some answers quickly. >>> See https://issues.apache.org/jira/browse/HBASE-2388 >>> >>> J-D >>> >>>