Thanks Ryan for sharing your knowledge. Thanks & Regards Aseem Puri
-----Original Message----- From: Ryan Rawson [mailto:ryano...@gmail.com] Sent: Tuesday, April 14, 2009 1:20 PM To: hbase-user@hadoop.apache.org Subject: Re: Some HBase FAQ every time memcache fills up, you get a flush - that is ~64mb. But lots of files reduces performance, so minor compaction just merges them into 1 file. Since all files are sorted, a file-merge sort is fast and efficient. this is a minor compaction. For major compactions one would need to do more work to prune old values. -ryan On Tue, Apr 14, 2009 at 12:46 AM, Puri, Aseem <aseem.p...@honeywell.com>wrote: > One more thing I want ask that in minor compaction definition in HBase > documentation is "when the number of MapFiles exceeds a configurable > threshold, a minor compaction is performed which consolidates the most > recently written MapFiles" > > So it means when we insert data in HBase table, 3 to 4 mapfiles are > generated for one category, but after some time all mapfiles combines as > one file. Is this we call minor compaction actually? > > Thanks & Regards > Aseem Puri > > -----Original Message----- > From: Ryan Rawson [mailto:ryano...@gmail.com] > Sent: Tuesday, April 14, 2009 12:59 PM > To: hbase-user@hadoop.apache.org > Subject: Re: Some HBase FAQ > > The write-ahead log is to recover in crash scenarios. Even if the > regionserver crashes, recovery from log will save you. > > But even so, during a controlled shutdown, regionserver flushes memcache > -> > disk. If the master dies, this flush should get your data to persistent > disk. > > On Tue, Apr 14, 2009 at 12:24 AM, Puri, Aseem > <aseem.p...@honeywell.com>wrote: > > > Actually I read that if HBase master fails cluster will shut down, so > I > > think for that instance if the data in currently memcache will also > > lost. So may be it minimize data loss. It's just what I am thinking. > If > > I am wrong regarding this issue please correct me. > > > > Thanks & Regards > > Aseem Puri > > > > -----Original Message----- > > From: Ryan Rawson [mailto:ryano...@gmail.com] > > Sent: Tuesday, April 14, 2009 12:47 PM > > To: hbase-user@hadoop.apache.org > > Subject: Re: Some HBase FAQ > > > > I don't understand the rationale for mysql buffering... HBase handles > > writes > > well, it is not a weak point, so just directly write into HBase. > > > > -ryan > > > > On Tue, Apr 14, 2009 at 12:15 AM, Puri, Aseem > > <aseem.p...@honeywell.com>wrote: > > > > > So it is possible that without* loading all region we can have some > > part > > > of data in memory that is required. > > > > > > Can you also suggest me what should I do for a situation: > > > > > > -- For my application where I will use HBase which will do updates > in > > a > > > table frequently. I want your suggestion on that what technique > should > > I > > > follow for write operation: > > > > > > a. If there is some update I should store data temporarily in MySQL > > and > > > then do bulk update on HBase after some time. > > > > > > Or > > > > > > b. As if there is an update I should directly update on HBase > instead > > of > > > writing it in MySQL. > > > > > > What you say, what approach is more optimized? > > > > > > Thanks & Regards > > > Aseem Puri > > > > > > -----Original Message----- > > > From: Ryan Rawson [mailto:ryano...@gmail.com] > > > Sent: Tuesday, April 14, 2009 12:33 PM > > > To: hbase-user@hadoop.apache.org > > > Subject: Re: Some HBase FAQ > > > > > > Only a part of the file on HDFS is read into memory to serve the > > > request. > > > It is not required to hold the entire file in ram. > > > > > > > > > -ryan > > > > > > On Mon, Apr 13, 2009 at 11:56 PM, Puri, Aseem > > > <aseem.p...@honeywell.com>wrote: > > > > > > > > > > > Ryan, > > > > > > > > Thanks for updating me, Also please tell me what will happen if is > > > read > > > > operation then required region is bring into RAM or not? > > > > > > > > Thanks & Regards > > > > Aseem Puri > > > > > > > > > > > > -----Original Message----- > > > > From: Ryan Rawson [mailto:ryano...@gmail.com] > > > > Sent: Tuesday, April 14, 2009 12:23 PM > > > > To: hbase-user@hadoop.apache.org > > > > Subject: Re: Some HBase FAQ > > > > > > > > yes exactly. The regionserver loads the index on start up in one > > go, > > > > holds > > > > it in ram - then it can use this index to do small specific reads > > from > > > > HDFS. > > > > > > > > I found that in hbase 0.20 I was using about 700kB/ram per 5m > rows, > > 40 > > > > byte > > > > values. > > > > > > > > -ryan > > > > > > > > On Mon, Apr 13, 2009 at 11:50 PM, Puri, Aseem > > > > <aseem.p...@honeywell.com>wrote: > > > > > > > > > Hi Ryan, > > > > > > > > > > It means Regionserver have only index file of regions but not > the > > > > actual > > > > > data that is on HDFS. > > > > > > > > > > Thanks & Regards > > > > > Aseem Puri > > > > > > > > > > -----Original Message----- > > > > > From: Ryan Rawson [mailto:ryano...@gmail.com] > > > > > Sent: Tuesday, April 14, 2009 12:16 PM > > > > > To: hbase-user@hadoop.apache.org > > > > > Subject: Re: Some HBase FAQ > > > > > > > > > > HBase loads the index of the files on start-up, if you ran out > of > > > > memory > > > > > for > > > > > those indexes (which are a fraction of the data size), you'd > crash > > > > with > > > > > OOME. > > > > > > > > > > The index is supposed to be a smallish fraction of the total > data > > > > size. > > > > > > > > > > I wouldn't run with less than -Xmx2000m > > > > > > > > > > On Mon, Apr 13, 2009 at 10:48 PM, Puri, Aseem > > > > > <aseem.p...@honeywell.com>wrote: > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Erik Holstad [mailto:erikhols...@gmail.com] > > > > > > Sent: Monday, April 13, 2009 9:47 PM > > > > > > To: hbase-user@hadoop.apache.org > > > > > > Subject: Re: Some HBase FAQ > > > > > > > > > > > > On Mon, Apr 13, 2009 at 7:12 AM, Puri, Aseem > > > > > > <aseem.p...@honeywell.com>wrote: > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > I am new HBase user. I have some doubts regards > > > > > > > functionality of HBase. I am working on HBase, things are > > going > > > > fine > > > > > > but > > > > > > > I am not clear how are things happening. Please help me by > > > > answering > > > > > > > these questions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. I am inserting data in HBase table and all regions > get > > > > > > balanced > > > > > > > across various Regionservers. But what will happens when > data > > > > > > increases > > > > > > > and there is not enough space in Regionservers to > accommodate > > > all > > > > > > > regions. So I will like this that some regions in > Regionserver > > > and > > > > > > some > > > > > > > are at HDFS but not on Regionserver or HBase Regioservers > stop > > > > > taking > > > > > > > new data? > > > > > > > > > > > > > Not really sure what you mean here, but if you are asking what > > to > > > do > > > > > > when > > > > > > you are > > > > > > running out of disk space on the regionservers, the answer is > > add > > > > > > another > > > > > > machine > > > > > > or two. > > > > > > > > > > > > --- I want ask that HBase RegionServer store regions data on > > HDFS. > > > > So > > > > > > when HBase master starts it loads all region data from HDFS to > > > > > > regionserver. So what will the scenario if there is not enough > > > space > > > > > in > > > > > > regionservers to accommodate new data? Is some regions swapped > > out > > > > > from > > > > > > regionserver to create space for new regions and when needed > > swaps > > > > in > > > > > > regions to regionserver from HDFS. Or something else will > > happen. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. When I insert data in HBase table, 3 to 4 mapfiles > are > > > > > > generated > > > > > > > for one category, but after some time all mapfiles combines > as > > > one > > > > > > file. > > > > > > > Is this we call minor compaction actually? > > > > > > > > > > > > > When all current mapfiles and memcache are combined into one > > > files, > > > > > this > > > > > > is called major compaction, see BigTable paper for more > details. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. For my application where I will use HBase will have > > > > updates > > > > > in > > > > > > a > > > > > > > table frequently. Should is use some other database as a > > > > > intermediate > > > > > > to > > > > > > > store data temporarily like MySQL and then do bulk update on > > > HBase > > > > > or > > > > > > > should I directly do updates on HBase. Please tell which > > > technique > > > > > > will > > > > > > > be more optimized in HBase? > > > > > > > > > > > > > HBase is fast for reads which has so far been the main focus > of > > > the > > > > > > development, with > > > > > > 0.20 we can hopefully add even fast random reading to it to > make > > > it > > > > a > > > > > > more > > > > > > well rounded > > > > > > system. Is HBase too slow for you today when writing to it and > > > what > > > > > are > > > > > > your > > > > > > requirements? > > > > > > > > > > > > ---- Basically I put this question for writing operation. Not > > any > > > > > > complex requirement. I want your suggestion on that what > > technique > > > > > > should I follow for write operation: > > > > > > > > > > > > a. If there is some update I should store data temporarily in > > > MySQL > > > > > and > > > > > > then do bulk update on HBase > > > > > > > > > > > > b. As if there is an update I should directly update on HBase > > > > instead > > > > > of > > > > > > writing it in MySQL and after some time doing bulk update on > > > HBase. > > > > > > > > > > > > What you say, what approach is more optimized? > > > > > > > > > > > > > > > > > > > > >