We have successfully run a quorum of 5 nodes for a few days. A few hours earlies today, one of our developers reported a failure in his HBase testing scripts. I run a test on my own working machine then. After examining the scripts and digging into some of HBase source codes, I came up with the first email in this thread.
Thanks for the great job again. On Tue, Aug 4, 2009 at 3:49 PM, Ryan Rawson<[email protected]> wrote: > I generally avoid reading the zookeeper log file, it's very very noisy > and I have never gotten anything useful out of it :-) > > There is more work and best practices we need to do surrounding the > zookeeper, and these will be encoded in our scripts as we figure it > all out. > > The first big step is to make sure to run a quorum on a cluster, and > the startup scripts facilitate that. Please use it! > > have fun! > -ryan > > On Tue, Aug 4, 2009 at 12:42 AM, Angus He<[email protected]> wrote: >> Thanks for the brilliant comments, Ryan. >> >> For each of this not so graceful close, zookeeper will populate its >> log file with a WARN record that just likes >> 2009-08-04 15:15:35,831 WARN >> org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of >> session 0x122e34dd69b00ad due to java.io.IOException: Read error. >> >> It might be confuse some users of HBase, probably we can put some >> information about this in the documentation. >> >> >> On Tue, Aug 4, 2009 at 2:41 PM, Ryan Rawson<[email protected]> wrote: >>> We should move the clients to a non-active server API, possibly the >>> REST one, and avoid using active sessions just for clients. Something >>> to address in 0.21 I think. >>> >>> As for #2, it is recommended now to run a quorum of zookeeper instead >>> of a single one. This reduces the risk of running out of connections. >>> >>> Also the code snippet you listed is a little degenerate, we can never >>> fully protect ourselves from fork-bomb like code. Your code snippet >>> suggests that: >>> - you are creating/closing HTable a lot. Maybe you shouldn't do that? >>> HTablePool? >>> - you have 1024+ tables, and need to access them in one client at one time. >>> >>> In the mean time, highly consider upgrading to a cluster of 5-7 ZK >>> hosts. For production, you should consider NOT running them on your >>> HBase/HDFS/map-reduce nodes. >>> >>> Good luck! >>> -ryan >>> >>> On Mon, Aug 3, 2009 at 11:00 PM, Angus He<[email protected]> wrote: >>>> Hi All, >>>> >>>> In HBase 0.20rc, HTable does not explicitly close the connection to >>>> zookeeper in HTable::close. >>>> It probably could be better. And in my opinion, it should be for: >>>> >>>> 1. It is not well-behaved, although zookeeper is able to detect the >>>> lost connection after issuing networking I/O operation, . >>>> 2. It is easy to get zookeeper server stuck with exceptions like "Too >>>> many connections from /0:0:0:0:0 :0:0:1 - max is 30", when user >>>> write codes like: >>>> for (int i = 0; i < 1024; ++i) { >>>> HTable table = new HTable("foobar"); >>>> table.close(); >>>> } >>>> >>>> In the current implementation, different HTable instances share the >>>> same connection to zookeeper if they have same HBaseConfiguration >>>> instance. For this, we cannot close the connection directly in HTable, >>>> but probably we could implement HConnection class with >>>> reference-counting ability. >>>> >>>> Any comments? >>>> >>>> -- >>>> Regards >>>> Angus >>>> >>> >> >> >> >> -- >> Regards >> Angus >> > -- Regards Angus
