Hi Jean-Daniel, Thank you for your input - I'll make these changes and try it tonight. I think it is probably also a good idea for me to enable compression now whilst the load is off the servers.
We have a physical space issue in our server cabinet which will get resolved sometime in march and we are planning to add an additional 3 servers to the setup + maybe an additional one for the namenode and master hBase server. I read somewhere that it is wise to place a datanode and regionserver together per server. Is this wise? Or is there a better way to configure this? Regards, Seraph > From: Jean-Daniel Cryans <[email protected]> > Reply-To: <[email protected]> > Date: Mon, 8 Feb 2010 10:11:36 -0800 > To: <[email protected]> > Subject: Re: Hbase pausing problems > > The "too many store files" is due to this > > <property> > <name>hbase.hstore.blockingStoreFiles</name> > <value>7</value> > <description> > If more than this number of StoreFiles in any one Store > (one StoreFile is written per flush of MemStore) then updates are > blocked for this HRegion until a compaction is completed, or > until hbase.hstore.blockingWaitTime has been exceeded. > </description> > </property> > > This block is there in order to not overrun the system with uncompacted > files. In the past I saw an import driving the number of store files to more > than 100 and it was just impossible to compact. The default setting is > especially low since the default heap size is 1GB, with 3GB you could set it > to 13-15. > > Since you have a high number of regions, consider tweaking this: > > <property> > <name>hbase.regions.percheckin</name> > <value>10</value> > <description>Maximum number of regions that can be assigned in a single > go > to a region server. > </description> > </property> > > Since you have such a low number of nodes, a value of 100 would make a lot > of sense. > > On a general note, it seems that your machines are unable to keep up with > the size of data that's coming in and lots of compaction (and flushes) need > to happen. The fact that only 3 machines are doing the work exacerbates the > problem. Using the configurations I just told you about will lesser the > problem but you should really consider using LZO or even GZ since all you > care about is storing a lot of data and only read a few rows per day. > Enabling GZ won't require any new software on these nodes and there's no > chance of losing data. > > J-D > > On Mon, Feb 8, 2010 at 5:28 AM, Seraph Imalia <[email protected]> wrote: > >> Hi Guys, >> >> I am having another problem with hBase that is probably related to the >> problems I was emailing you about earlier this year. >> >> I have finally had a chance to at least try one of the suggestions you had >> to help resolve our problems. I increased the heap size per server to 3Gig >> and added the following to the hbase-site.xml files on each server last >> night (I have not enabled compression yet for fear of loosing data - I need >> to wait for when I have a long period of time where hBase can be offline for >> and for incase there are problems I need to resolve) ... >> >> <property> >> <name>hbase.regionserver.global.memstore.upperLimit</name> >> <value>0.5</value> >> <description>Maximum size of all memstores in a region server before new >> updates are blocked and flushes are forced. Defaults to 40% of heap >> </description> >> </property> >> <property> >> <name>hbase.regionserver.global.memstore.lowerLimit</name> >> <value>0.48</value> >> <description>When memstores are being forced to flush to make room in >> memory, keep flushing until we hit this mark. Defaults to 30% of heap. >> This value equal to hbase.regionserver.global.memstore.upperLimit >> causes >> the minimum possible flushing to occur when updates are blocked due to >> memstore limiting. >> </description> >> </property> >> >> ...and then restarted hbase >> bin/stop-hbase.sh >> bin/start-hbase.sh >> >> Hbase spent about 30 minutes assigning regions to each of the region >> servers (we now have 2595 regions). When it had finished (which is usually >> when our clients apps are able to start adding rows), client apps were only >> able to add rows at an incredibly slow rate (about 1 every second) which was >> not even able to cope with the miniscule load we have at 3AM in the morning. >> >> I left hBase for about 30 minutes after region assignment had completed and >> the situation did not improve. I then tried changing the lowerLimit to 0.38 >> and restart again which also did not improve the situation. I then removed >> the above lines by commenting them out (<!-- -->) and restarted hBase again. >> Again, 30 minutes later after it had finished assigning regions, it was no >> different. >> >> I therefore assumed that the problem was not caused by the addition of the >> properties but rather just by the fact that it had been restarted. I >> checked the log files very closely and I noticed that when I disable the >> client apps, the regionservers are frantically requesting major compactions >> and complaining about too many store files for a region. >> >> I then assumed that the system is under strain performing houskeeping and >> there is nothing I can do with my limited knowledge to improve it without >> contacting you guys about it first. It was 4AM this morning and I had no >> choice but to do whatever I could to get our client apps up and running >> before morning, so I wrote some quick coldfusion and java code to get the >> data inserted into local mysql servers so that hBase could have time to do >> whatever it was doing. >> >> It is still compacting and it is now 9 hours after the last restart. With 0 >> load from client apps. >> >> Please can you assist by shedding some light on what is actually happening? >> - Is my thinking correct? - Is it related to the "hBase pausing problems" we >> are still having? - What do I do to fix it or make it hurry up? >> >> Regards, >> Seraph >> >>
