Hi Jean-Daniel,

Thank you for your input - I'll make these changes and try it tonight.  I
think it is probably also a good idea for me to enable compression now
whilst the load is off the servers.

We have a physical space issue in our server cabinet which will get resolved
sometime in march and we are planning to add an additional 3 servers to the
setup + maybe an additional one for the namenode and master hBase server.  I
read somewhere that it is wise to place a datanode and regionserver together
per server.  Is this wise?  Or is there a better way to configure this?

Regards,
Seraph 


> From: Jean-Daniel Cryans <[email protected]>
> Reply-To: <[email protected]>
> Date: Mon, 8 Feb 2010 10:11:36 -0800
> To: <[email protected]>
> Subject: Re: Hbase pausing problems
> 
> The "too many store files" is due to this
> 
>   <property>
>     <name>hbase.hstore.blockingStoreFiles</name>
>     <value>7</value>
>     <description>
>     If more than this number of StoreFiles in any one Store
>     (one StoreFile is written per flush of MemStore) then updates are
>     blocked for this HRegion until a compaction is completed, or
>     until hbase.hstore.blockingWaitTime has been exceeded.
>     </description>
>   </property>
> 
> This block is there in order to not overrun the system with uncompacted
> files. In the past I saw an import driving the number of store files to more
> than 100 and it was just impossible to compact. The default setting is
> especially low since the default heap size is 1GB, with 3GB you could set it
> to 13-15.
> 
> Since you have a high number of regions, consider tweaking this:
> 
>   <property>
>     <name>hbase.regions.percheckin</name>
>     <value>10</value>
>     <description>Maximum number of regions that can be assigned in a single
> go
>     to a region server.
>     </description>
>   </property>
> 
> Since you have such a low number of nodes, a value of 100 would make a lot
> of sense.
> 
> On a general note, it seems that your machines are unable to keep up with
> the size of data that's coming in and lots of compaction (and flushes) need
> to happen. The fact that only 3 machines are doing the work exacerbates the
> problem. Using the configurations I just told you about will lesser the
> problem but you should really consider using LZO or even GZ since all you
> care about is storing a lot of data and only read a few rows per day.
> Enabling GZ won't require any new software on these nodes and there's no
> chance of losing data.
> 
> J-D
> 
> On Mon, Feb 8, 2010 at 5:28 AM, Seraph Imalia <[email protected]> wrote:
> 
>> Hi Guys,
>> 
>> I am having another problem with hBase that is probably related to the
>> problems I was emailing you about earlier this year.
>> 
>> I have finally had a chance to at least try one of the suggestions you had
>> to help resolve our problems.  I increased the heap size per server to 3Gig
>> and added the following to the hbase-site.xml files on each server last
>> night (I have not enabled compression yet for fear of loosing data - I need
>> to wait for when I have a long period of time where hBase can be offline for
>> and for incase there are problems I need to resolve) ...
>> 
>> <property>
>>    <name>hbase.regionserver.global.memstore.upperLimit</name>
>>    <value>0.5</value>
>>    <description>Maximum size of all memstores in a region server before new
>>      updates are blocked and flushes are forced. Defaults to 40% of heap
>>    </description>
>> </property>
>> <property>
>>    <name>hbase.regionserver.global.memstore.lowerLimit</name>
>>    <value>0.48</value>
>>    <description>When memstores are being forced to flush to make room in
>>      memory, keep flushing until we hit this mark. Defaults to 30% of heap.
>>      This value equal to hbase.regionserver.global.memstore.upperLimit
>> causes
>>      the minimum possible flushing to occur when updates are blocked due to
>>      memstore limiting.
>>    </description>
>> </property>
>> 
>> ...and then restarted hbase
>> bin/stop-hbase.sh
>> bin/start-hbase.sh
>> 
>> Hbase spent about 30 minutes assigning regions to each of the region
>> servers (we now have 2595 regions).  When it had finished (which is usually
>> when our clients apps are able to start adding rows), client apps were only
>> able to add rows at an incredibly slow rate (about 1 every second) which was
>> not even able to cope with the miniscule load we have at 3AM in the morning.
>> 
>> I left hBase for about 30 minutes after region assignment had completed and
>> the situation did not improve.  I then tried changing the lowerLimit to 0.38
>> and restart again which also did not improve the situation.  I then removed
>> the above lines by commenting them out (<!-- -->) and restarted hBase again.
>>  Again, 30 minutes later after it had finished assigning regions, it was no
>> different.
>> 
>> I therefore assumed that the problem was not caused by the addition of the
>> properties but rather just by the fact that it had been restarted.  I
>> checked the log files very closely and I noticed that when I disable the
>> client apps, the regionservers are frantically requesting major compactions
>> and complaining about too many store files for a region.
>> 
>> I then assumed that the system is under strain performing houskeeping and
>> there is nothing I can do with my limited knowledge to improve it without
>> contacting you guys about it first.  It was 4AM this morning and I had no
>> choice but to do whatever I could to get our client apps up and running
>> before morning, so I wrote some quick coldfusion and java code to get the
>> data inserted into local mysql servers so that hBase could have time to do
>> whatever it was doing.
>> 
>> It is still compacting and it is now 9 hours after the last restart. With 0
>> load from client apps.
>> 
>> Please can you assist by shedding some light on what is actually happening?
>> - Is my thinking correct? - Is it related to the "hBase pausing problems" we
>> are still having? - What do I do to fix it or make it hurry up?
>> 
>> Regards,
>> Seraph
>> 
>> 



Reply via email to