Re: hbase bulk writes

Ryan Rawson Mon, 30 Nov 2009 15:38:22 -0800

Seems a little slow, but you will have a hard time getting a single
thread/process to do more than 10k rows/sec. Make your write buffer
larger (20MB?), dont forget to flush it too on shutdown!  Use more
processes or threads.  The usual tricks.


Good luck!

On Mon, Nov 30, 2009 at 3:33 PM, Calvin <calvin.li...@gmail.com> wrote:
> Thanks for the responses.  If I can avoid writing a map-reduce job that
> would be preferable (getting map-reduce to work with / depend on my existing
> infrastructure is turning out to be annoying).
>
> I have no good way of randomizing my dataset since it's a very large stream
> of sequential data (ordered by some key).  I have a fair number of column
> families (~25) and every column is a long or a double.  Having a standalone
> program that writes rows using the HTable / Put API seems to run at ~2-5000
> rows/sec, which seems ridiculously slow.  Is it possible I am doing
> something terribly wrong?
>
> -Calvin
>
> On Mon, Nov 30, 2009 at 5:47 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>
>> Sequentially ordered rows is the worst insert case in HBase - you end
>> up writing all to 1 server even if you have 500.  If you could
>> randomize your input, and I have pasted a Randomize.java map reduce
>> that will randomize lines of a file, then your performance will
>> improve.
>>
>> I have seen sustained inserts of 100-300k rows/sec on small rows
>> before.  Obviously large blob rows will be slower, since the limiting
>> factor is how fast we can write data to HDFS, thus it isnt the actual
>> row count, but the amount of data involved.
>>
>> Try the randomize.java, see where that gets you. I think it's on the
>> list archives.
>>
>> -ryan
>>
>>
>> On Mon, Nov 30, 2009 at 2:41 PM, Jean-Daniel Cryans <jdcry...@apache.org>
>> wrote:
>> > Could you put your data in HDFS and load it from there with a MapReduce
>> job?
>> >
>> > J-D
>> >
>> > On Mon, Nov 30, 2009 at 2:33 PM, Calvin <calvin.li...@gmail.com> wrote:
>> >> I have a large amount of sequential ordered rows I would like to write
>> to an
>> >> HBase table.  What is the preferred way to do bulk writes of
>> multi-column
>> >> tables in HBase?  Using the get/put interface seems fairly slow even if
>> I
>> >> bulk writes with table.put(List<Put>).
>> >>
>> >> I have followed the directions on:
>> >>   * http://wiki.apache.org/hadoop/PerformanceTuning
>> >>   *
>> >>
>> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
>> >>
>> >> Are there any other resources for improving the throughput of my bulk
>> >> writes?  On
>> >>
>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlI
>> >> see there's a way to write HFiles directly, but HFileOutputFormat can
>> >> only
>> >> write a single column famly at a time (
>> >> https://issues.apache.org/jira/browse/HBASE-1861).
>> >>
>> >> Thanks!
>> >>
>> >> -Calvin
>> >>
>> >
>>
>

Re: hbase bulk writes

Reply via email to