Seems a little slow, but you will have a hard time getting a single thread/process to do more than 10k rows/sec. Make your write buffer larger (20MB?), dont forget to flush it too on shutdown! Use more processes or threads. The usual tricks.
Good luck! On Mon, Nov 30, 2009 at 3:33 PM, Calvin <calvin.li...@gmail.com> wrote: > Thanks for the responses. If I can avoid writing a map-reduce job that > would be preferable (getting map-reduce to work with / depend on my existing > infrastructure is turning out to be annoying). > > I have no good way of randomizing my dataset since it's a very large stream > of sequential data (ordered by some key). I have a fair number of column > families (~25) and every column is a long or a double. Having a standalone > program that writes rows using the HTable / Put API seems to run at ~2-5000 > rows/sec, which seems ridiculously slow. Is it possible I am doing > something terribly wrong? > > -Calvin > > On Mon, Nov 30, 2009 at 5:47 PM, Ryan Rawson <ryano...@gmail.com> wrote: > >> Sequentially ordered rows is the worst insert case in HBase - you end >> up writing all to 1 server even if you have 500. If you could >> randomize your input, and I have pasted a Randomize.java map reduce >> that will randomize lines of a file, then your performance will >> improve. >> >> I have seen sustained inserts of 100-300k rows/sec on small rows >> before. Obviously large blob rows will be slower, since the limiting >> factor is how fast we can write data to HDFS, thus it isnt the actual >> row count, but the amount of data involved. >> >> Try the randomize.java, see where that gets you. I think it's on the >> list archives. >> >> -ryan >> >> >> On Mon, Nov 30, 2009 at 2:41 PM, Jean-Daniel Cryans <jdcry...@apache.org> >> wrote: >> > Could you put your data in HDFS and load it from there with a MapReduce >> job? >> > >> > J-D >> > >> > On Mon, Nov 30, 2009 at 2:33 PM, Calvin <calvin.li...@gmail.com> wrote: >> >> I have a large amount of sequential ordered rows I would like to write >> to an >> >> HBase table. What is the preferred way to do bulk writes of >> multi-column >> >> tables in HBase? Using the get/put interface seems fairly slow even if >> I >> >> bulk writes with table.put(List<Put>). >> >> >> >> I have followed the directions on: >> >> * http://wiki.apache.org/hadoop/PerformanceTuning >> >> * >> >> >> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html >> >> >> >> Are there any other resources for improving the throughput of my bulk >> >> writes? On >> >> >> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlI >> >> see there's a way to write HFiles directly, but HFileOutputFormat can >> >> only >> >> write a single column famly at a time ( >> >> https://issues.apache.org/jira/browse/HBASE-1861). >> >> >> >> Thanks! >> >> >> >> -Calvin >> >> >> > >> >