On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia <[email protected]> wrote:
> > Does this mean that when 1 regionserver does a memstore flush, the other > two > regionservers are also unavailable for writes? I have watched the logs > carefully to make sure that not all the regionservers are flushing at the > same time. Most of the time, only 1 server flushes at a time and in rare > cases, I have seen two at a time. > > No. Flush is background process. Reads and writes go ahead while flushing is happening. > > > > It also looks like you have little RAM space given over to hbase, just > 1G? > > If your traffic is bursty, giving hbase more RAM might help it get over > > these write humps. > > I have it at 1G on purpose. When we first had the problem, I immediately > thought the problem was resource related, so I increased the hBase RAM to > 3G > (each server has 8G - I was carefull to watch for swapping). This made the > problem worse because each memstore flush took longer which stopped writing > for longer and people started noticing that our system was down during > those > periods. See above, flushing doesn't block read/writes. Maybe this was something else? A GC pause that ran longer because heap is bigger? You said you had gc logging enabled. Did you see any long pauses? (Our ZooKeeper brothers suggest https://gchisto.dev.java.net/ as a help reading GC logs). Let me look at your logs to see if I see anything else up there. > > Clients will be blocked writing regions carried by the effected > regionserver > > only. Your HW is not appropriate to the load as currently setup. You > might > > also consider adding more machines to your cluster. > > > > Hmm... How does hBase decide which region to write to? Is it possible that > hBase is deciding to write all our current records to one specific region > that happens to be on the server that is busy doing a memstore flush? > > Checkout the region list in master UI. See how they are defined by their start and end key. Clients write rows to the region hosting the pertinent row-span. Its quiet possible all writes are going to a single region on a single server -- which is often an issue -- if your key scheme has something like current time for a prefix. > We are currently inserting about 6 million rows per day. 6M rows is low, even for a cluster as small as yours (though, maybe your inserts are fat? Big cells, many at a time?). > SQL Server (which > I am so happy to no longer be using for this) was able to write (and > replicate to a slave) 9 million records (using the same spec'ed server). I > would like to see hBase cope with the 3 we have given it at least when > inserting 6 million. Do you think this is possible or is our only answer > to > throw on more servers? > > 3 servers should be well able. Tell me more about your schema -- though, nevermind, i can find it in your master log. St.Ack > > St.Ack > > > > > > > >> Thank you for your assistance thus far; please let me know if you need > or > >> discover anything else? > >> > >> Regards, > >> Seraph > >> > >> > >> > >>> From: Jean-Daniel Cryans <[email protected]> > >>> Reply-To: <[email protected]> > >>> Date: Mon, 18 Jan 2010 09:49:16 -0800 > >>> To: <[email protected]> > >>> Subject: Re: Hbase pausing problems > >>> > >>> The next step would be to take a look at your region server's log > >>> around the time of the insert and clients who don't resume after the > >>> loss of a region server. If you are able to gzip them and put them on > >>> a public server, it would be awesome. > >>> > >>> Thx, > >>> > >>> J-D > >>> > >>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia <[email protected]> > >> wrote: > >>>> Answers below... > >>>> > >>>> Regards, > >>>> Seraph > >>>> > >>>>> From: stack <[email protected]> > >>>>> Reply-To: <[email protected]> > >>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800 > >>>>> To: <[email protected]> > >>>>> Subject: Re: Hbase pausing problems > >>>>> > >>>>> How many CPUs? > >>>> > >>>> 1x Quad Xeon in each server > >>>> > >>>>> > >>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh). > >> You > >>>>> might want to enable GC logging. See the line after hbase-env.sh. > >> Enable > >>>>> it. GC logging might tell you about the pauses you are seeing. > >>>> > >>>> I will enable GC Logging tonight during our slow time because > restarting > >> the > >>>> regionservers causes the clients to pause indefinitely. > >>>> > >>>>> > >>>>> Can you get a fourth server for your cluster and run the master, zk, > >> and > >>>>> namenode on it and leave the other three servers for regionserver and > >>>>> datanode (with perhaps replication == 2 as per J-D to lighten load on > >> small > >>>>> cluster). > >>>> > >>>> We plan to double the number of servers in the next few weeks and I > will > >>>> take your advice to put the master, zk and namenode on it (we will > need > >> to > >>>> have a second one on standby should this one crash). The servers will > >> be > >>>> ordered shortly and will be here in a week or two. > >>>> > >>>> That said, I have been monitoring CPU usage and none of them seem > >>>> particularly busy. The regionserver on each one hovers around 30% all > >> the > >>>> time and the datanode sits at about 10% most of the time. If we do > have > >> a > >>>> resource issue, it definitely does not seem to be CPU. > >>>> > >>>> Increasing RAM did not seem to work either - it just made hBase use a > >> bigger > >>>> memstore and then it took longer to do a flush. > >>>> > >>>> > >>>>> > >>>>> More notes inline in below. > >>>>> > >>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia <[email protected]> > >> wrote: > >>>>> > >>>>>> Approximately every 10 minutes, our entire coldfusion system pauses > at > >> the > >>>>>> point of inserting into hBase for between 30 and 60 seconds and then > >>>>>> continues. > >>>>>> > >>>>>> Yeah, enable GC logging. See if you can make correlation between > the > >> pause > >>>>> the client is seeing and a GC pause. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Investigation... > >>>>>> > >>>>>> Watching the logs of the regionserver, the pausing of the coldfusion > >> system > >>>>>> happens as soon as one of the regionservers starts flushing the > >> memstore > >>>>>> and > >>>>>> recovers again as soon as it is finished flushing (recovers as soon > as > >> it > >>>>>> starts compacting). > >>>>>> > >>>>> > >>>>> > >>>>> ...though, this would seem to point to an issue with your hardware. > >> How > >>>>> many disks? Are they misconfigured such that they hold up the system > >> when > >>>>> they are being heavily written to? > >>>>> > >>>>> > >>>>> A regionserver log at DEBUG from around this time so we could look at > >> it > >>>>> would be helpful. > >>>>> > >>>>> > >>>>> I can recreate the error just by stopping 1 of the regionservers; but > >> then > >>>>>> starting the regionserver again does not make coldfusion recover > until > >> I > >>>>>> restart the coldfusion servers. It is important to note that if I > >> keep the > >>>>>> built in hBase shell running, it is happily able to put and get data > >> to and > >>>>>> from hBase whilst coldfusion is busy pausing/failing. > >>>>>> > >>>>> > >>>>> This seems odd. Enable DEBUG for the client-side. Do you see the > >> shell > >>>>> recalibrating finding new locations for regions after you shutdown > the > >>>>> single regionserver, something that your coldfusion is not doing? > Or, > >>>>> maybe, the shell is putting a regionserver that has not been > disturbed > >> by > >>>>> your start/stop? > >>>>> > >>>>> > >>>>>> > >>>>>> I have tried increasing the regionserver¹s RAM to 3 Gigs and this > just > >> made > >>>>>> the problem worse because it took longer for the regionservers to > >> flush the > >>>>>> memory store. > >>>>> > >>>>> > >>>>> Again, if flushing is holding up the machine, if you can't write a > file > >> in > >>>>> background without it freezing your machine, then your machines are > >> anemic > >>>>> or misconfigured? > >>>>> > >>>>> > >>>>>> One of the links I found on your site mentioned increasing > >>>>>> the default value for hbase.regionserver.handler.count to 100 this > >> did > >>>>>> not > >>>>>> seem to make any difference. > >>>>> > >>>>> > >>>>> Leave this configuration in place I'd say. > >>>>> > >>>>> Are you seeing 'blocking' messages in the regionserver logs? > >> Regionserver > >>>>> will stop taking on writes if it thinks its being overrun to prevent > >> itself > >>>>> OOME'ing. Grep the 'multiplier' configuration in hbase-default.xml. > >>>>> > >>>>> > >>>>> > >>>>>> I have double checked that the memory flush > >>>>>> very rarely happens on more than 1 regionserver at a time in fact > in > >> my > >>>>>> many hours of staring at tails of logs, it only happened once where > >> two > >>>>>> regionservers flushed at the same time. > >>>>>> > >>>>>> You've enabled DEBUG? > >>>>> > >>>>> > >>>>> > >>>>>> My investigations point strongly towards a coding problem on our > side > >>>>>> rather > >>>>>> than a problem with the server setup or hBase itself. > >>>>> > >>>>> > >>>>> If things were slow from client-perspective, that might be a > >> client-side > >>>>> coding problem but these pauses, unless you have a fly-by deadlock in > >> your > >>>>> client-code, its probably an hbase issue. > >>>>> > >>>>> > >>>>> > >>>>>> I say this because > >>>>>> whilst I understand why a regionserver would go offline during a > >> memory > >>>>>> flush, I would expect the other two regionservers to pick up the > load > >> > >>>>>> especially since the built-in hbase shell has no problem accessing > >> hBase > >>>>>> whilst a regionserver is busy doing a memstore flush. > >>>>>> > >>>>>> HBase does not go offline during memory flush. It continues to be > >>>>> available for reads and writes during this time. And see J-D > response > >> for > >>>>> incorrect understanding of how loading of regions is done in an hbase > >>>>> cluster. > >>>>> > >>>>> > >>>>> > >>>>> ... > >>>>> > >>>>> > >>>>> I think either I am leaving out code that is required to determine > >> which > >>>>>> RegionServers are available OR I am keeping too many hBase objects > in > >> RAM > >>>>>> instead of calling their constructors each time (my purpose > obviously > >> was > >>>>>> to > >>>>>> improve performance). > >>>>>> > >>>>>> > >>>>> For sure keep single instance of HBaseConfiguration at least and use > >> this > >>>>> constructing all HTable and HBaseAdmin instances. > >>>>> > >>>>> > >>>>> > >>>>>> Currently the live system is inserting over 7 Million records per > day > >>>>>> (mostly between 8AM and 10PM) which is not a ridiculously high load. > >>>>>> > >>>>>> > >>>>> What size are the records? What is your table schema? How many > >> regions do > >>>>> you currently have in your table? > >>>>> > >>>>> St.Ack > >>>> > >>>> > >>>> > >>>> > >>>> > >> > >> > >> > >> > >> > > > > > >
