Have you looked into bulk imports? You can write your data into HDFS and then run a MapReduce job to generate the files that HBase uses to serve data. After the job finishes, there's a utility to copy the files into HBase's directory and your data is visible. Check out http://hbase.apache.org/bulk-loads.html for details.
-Joey On Fri, Oct 28, 2011 at 10:08 AM, Andreas Reiter <a.rei...@web.de> wrote: > Hi everybody, > > we have the following scenario: > our clustered web application needs to write records to hbase, we need to > support a very high throughput, we expect up to 10-30 thousends requests per > second and may be even more > > so usually it is not a problem for HBase, if we use a "random" row key; in > this case the data is distributed between all region servers equally > but, we need to generate our keys based on the current time, so we are able > to run MR jobs for a period of time without processing the whole data, using > scan.setStartRow(stopRow); > scan.setStopRow(startRow); > > in our case the generated row keys look similar and are there for going to > the same region server... so this approach is not really using the power of > the whole cluster, but only one server, which can be dangerous in case of a > very high load > > so, we are thinking about writing the records first to a HDFS file, and run > additionally a MR job periodically to read the finnished HDFS files and > insert the records to HBase > > what do you guys think about it? any suggestions would be very appreciated > > regards > andre > -- Joseph Echeverria Cloudera, Inc. 443.305.9434