There are lots of interesting ways you can design your tables and keys to avoid the single-regionserver hotspotting.
I did an experimental design a long time ago that pre-pended a random value to every row key, where the value was modulo'd by the number of regionservers or between 1/2 and 2 * #ofRS, so for a given stamp there would be that many potential regions it could go into. This design doesn't make time-range MR jobs very efficient though because a single range is spread out across the entire table... But I'm not sure you can avoid that if you want good distribution, those two requirements are at odds. You say 2TB of data a day on 10-20 nodes? What kinds of nodes are you expecting to use? In a month, that's 60TB of data, so 3-6TB per node? And that's pre-replication, so you're talking 9-18TB per node? And you want full random access to that, while running batch MR jobs, while continuously importing more? Seems that's a tall order. You'd be adding >1000 regions a day... and on 10-20 nodes? Do you really need full random access to the entire raw dataset? Could you load into HDFS, run batch jobs against HDFS, but also have some jobs that take HDFS data, run some aggregations/filters/etc, and then put _that_ data into HBase? You also say you're going to delete data. What's the time window you want to keep? HBase is capable of handling lots of stuff but you seem to want to process very large datasets (and the trifecta: heavy writes, batch/scanning reads, and random reads) on a very small cluster. 10 nodes is really bare-minimum for any production system serving any reasonably sized dataset (>1TB), unless the individual nodes are very powerful. JG On Thu, September 3, 2009 12:15 am, stack wrote: > On Wed, Sep 2, 2009 at 11:37 PM, Schubert Zhang <[email protected]> > wrote: > > >> >>> Do you need to keep it all? Does some data expire (or can it be >>> moved offline)? >>> >>> >> Yes, we need remove old data which expire. >> >> >> > When does data expire? Or, how many Billions of rows should your cluster > of 10-20 nodes carry at a time? > > > > The data will arrive with a minutes delay. > > > Usually, we need to write/ingest tens of thousands of new rows. Many rows > >> with the same timestamp. >> >> > Will the many rows of same timestamp all go into the one timestamp row or > will the key have a further qualifier such as event type to distingush > amongst the updates that arrive at the same timestamp? > > What do you see the as approximate write rate and what do you think its > spread across timestamps will be? E.g. 10000 updates a second and all of > the updates fit within a ten second window? > > Sorry for all the questions. > > > St.Ack > >
