On Fri, Sep 4, 2009 at 2:04 AM, Jonathan Gray <[email protected]> wrote:
> There are lots of interesting ways you can design your tables and keys to > avoid the single-regionserver hotspotting. > > I did an experimental design a long time ago that pre-pended a random > value to every row key, where the value was modulo'd by the number of > regionservers or between 1/2 and 2 * #ofRS, so for a given stamp there > would be that many potential regions it could go into. This design > doesn't make time-range MR jobs very efficient though because a single > range is spread out across the entire table... But I'm not sure you can > avoid that if you want good distribution, those two requirements are at > odds. > This way can make parallelism, but we cannot scan by time-range then. > > You say 2TB of data a day on 10-20 nodes? What kinds of nodes are you > expecting to use? > yes, uncompressed. normal server nodes, such as 16GB RAM, 8 cores, and many SATA disk, > > In a month, that's 60TB of data, so 3-6TB per node? And that's > pre-replication, so you're talking 9-18TB per node? And you want full > random access to that, while running batch MR jobs, while continuously > importing more? Seems that's a tall order. You'd be adding >1000 regions > a day... and on 10-20 nodes? > > yes, we expect. But our random reading and MR analysis requests traffic is very light. > Do you really need full random access to the entire raw dataset? Could > you load into HDFS, run batch jobs against HDFS, but also have some jobs > that take HDFS data, run some aggregations/filters/etc, and then put > _that_ data into HBase? > > We need to do MR analysis on time-ranged data to get report. We also want query historical data by using some indexing technique to achieve low latency (second-level) Seems, organizing data files into HDFS and store meta and index data into HBase make sense. We are making effort on this way, but it is very diffcult to build index! > You also say you're going to delete data. What's the time window you want > to keep? > > e.g. 3 months. the data expire will be droped. > HBase is capable of handling lots of stuff but you seem to want to process > very large datasets (and the trifecta: heavy writes, batch/scanning reads, > and random reads) on a very small cluster. 10 nodes is really > bare-minimum for any production system serving any reasonably sized > dataset (>1TB), unless the individual nodes are very powerful. > and the batch/scanning reads and random reads are light. :-) Yes, it is really diffcult and ambitious. So, the time-series data in our case is too large even than any popular websites. > > JG > > On Thu, September 3, 2009 12:15 am, stack wrote: > > On Wed, Sep 2, 2009 at 11:37 PM, Schubert Zhang <[email protected]> > > wrote: > > > > > >> > >>> Do you need to keep it all? Does some data expire (or can it be > >>> moved offline)? > >>> > >>> > >> Yes, we need remove old data which expire. > >> > >> > >> > > When does data expire? Or, how many Billions of rows should your cluster > > of 10-20 nodes carry at a time? > > > > > > > > The data will arrive with a minutes delay. > > > > > > Usually, we need to write/ingest tens of thousands of new rows. Many rows > > > >> with the same timestamp. > >> > >> > > Will the many rows of same timestamp all go into the one timestamp row or > > will the key have a further qualifier such as event type to distingush > > amongst the updates that arrive at the same timestamp? > > > > What do you see the as approximate write rate and what do you think its > > spread across timestamps will be? E.g. 10000 updates a second and all of > > the updates fit within a ten second window? > > > > Sorry for all the questions. > > > > > > St.Ack > > > > > >
