Re: hbase bulk writes

stack Tue, 29 Dec 2009 23:14:48 -0800

On Tue, Dec 29, 2009 at 3:05 PM, <zlatin.balev...@barclayscapital.com>wrote:

>
> The numbers are for binary format that not be very compressible.  Most
> of the data will be arriving during an 8-hour window.  It would be keyed
> by a nanosecond timestamp so all records will be unique.

You have mechanisms to guarantee no two events in same nanosecond?

> Data will be
> kept indefinitely; there will be rare  updates/deletions of small number
> of rows.  The main usage case is sequential range scanning and filtering
> of 2^(40+) rows.
>

This is whole table or a subset?

>
> There will be several column families; occasionally new ones will be
> added and old ones deprecated.

OK.  Writing hfiles directly when multiple column famlies is not yet done.
 It should be relatively straight-forward.  When schema change, the script
that writes hfiles directly will need to change in sympathy.  You'll need to
put in place a mechanism to make sure this happens.

Currently, adding/removing column families requires offlining the table but
plans are that for 0.21, it should be possible to do this to a live table.

> That flexibility, the strong data
> consistency and good scan performance (according to the published
> benchmarks) are the main reasons we're looking at Hbase.
>

Ok.

> A question: during the time after the bulk loading MR script has
> finished running and the meta scan runs, which could be up to a minute,
> how will querying and scanning work?

At the moment its a little messy.  Let me make some statements first about
how it currently works.

A script that runs at the end of the hfile writing MapReduce job adds the
newly minted regions up into the .META. catalog table.  Each new region
added means a rewrite of the last row on the table -- the last region's
endkey is no longer the marker for end-of-table -- and then script adds the
new last row, the new end-of-table region.  Locks are row scoped.   Locking
across rows, rows that don't yet exist is not possible.  So, during the
addition of the new regions, clients could get an inconsistent view of the
end-of-table.

When the .META. scanner runs (every minute by default), it will notice newly
added regions.  The newly added regions will not have been assigned so it
will go ahead and assign them to regionservers out on the cluster.  Regions
take a little time to open.

So, a scanner on open gets a list of all the regions that make up the scan
scope. If we're in middle of jiggling the end of table could get an 'off'
list of regions if it hits .META. just as its being transformed (More detail
available if you want it).  Simliar for random read (get).

At least the random read should repair itself on a subsequent retry (this
will show in your application as some increased latency).

The scanner could hit a condition where scanner errors out on the end of
scan.

In hbase 0.21, region state will be managed via queues up in zookeeper.
 Addition of regions and regions that comprise a table will be done
atomically.  This should help make it so clients always get a coherent view
on a table.  There will be no waiting on a .META. scan to notice newly added
regions to get them assigned (See master rewrite 'design' near the base of
the hbase wiki page).

>  Will they produce inconsistent
> results or just not see the new data?  What about update or delete
> operations?  Is it necessary to suspend/queue those and if so, is there
> a way to do that within Hbase.
>

The only hard part is making it so the new regions are either there or not.
 In 0.20, clients can get a grey view.

If you are asking about what do clients do between the addition of the new
region and their assignment, they retry till the region is onlined.   This
period is usually fairly short.

Keep asking questions,
St.Ack

>
> Best Regards,
> Zlatin
>
> -----Original Message-----
> From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of
> stack
> Sent: Tuesday, December 29, 2009 4:39 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: hbase bulk writes
>
> You've seen the description at
> http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg06010.html
> for how timeseries data might be added quickly to hbase by just adding
> regions to tail of a table?  They'd come online as soon as the next meta
> scan ran (usually every minute).
>
> Your schema requires multiple families?
>
> Generally loading behind the API will get you an order of magnitude
> improvement and more of bulk load speedup over loading via API.
>
> Are you numbers for compressed data?
> St.Ack
>
>
>
> On Tue, Dec 29, 2009 at 12:28 PM,
> <zlatin.balev...@barclayscapital.com>wrote:
>
> >
> > >Can you put your input files under an http server and then write a
> > mapreduce that pulls via HTTP?
> >
> > Greetings,
> >
> > I'm very interested in how much of an improvement would HBASE-1861
> > result in.  I am planning on inserting between  2^33 to 2^37 records
> > for aggregate 2^43 to 2^45 bytes on a daily basis.  The records will
> > be sequentially sorted, which I understand is the worst-case scenario
> > for inserting in a live Hbase system.  To make things even more
> > interesting, I can't afford any downtime, so any bulk load method will
>
> > have to append to existing tables.
> > Based on the load rates others are posting, I'm starting to doubt if
> > this will be possible with Hbase at all?  There will be plenty of cpu
> > cores and storage space.
> >
> > Best Regards,
> > Zlatin Balevsky
> > AVP AMM Group,
> > Barclays Capital
> > _______________________________________________
> >
> > This e-mail may contain information that is confidential, privileged
> > or otherwise protected from disclosure. If you are not an intended
> > recipient of this e-mail, do not duplicate or redistribute it by any
> > means. Please delete it and any attachments and notify the sender that
>
> > you have received it in error. Unless specifically indicated, this
> > e-mail is not an offer to buy or sell or a solicitation to buy or sell
>
> > any securities, investment products or other financial product or
> > service, an official confirmation of any transaction, or an official
> > statement of Barclays. Any views or opinions presented are solely
> > those of the author and do not necessarily represent those of
> > Barclays. This e-mail is subject to terms available at the following
> > link: www.barcap.com/emaildisclaimer. By messaging with Barclays you
> > consent to the foregoing.  Barclays Capital is the investment banking
> > division of Barclays Bank PLC, a company registered in England (number
> > 1026167) with its registered office at 1 Churchill Place, London, E14
> 5HP.
> >  This email may relate to or be sent from other members of the
> > Barclays Group.
> > _______________________________________________
> >
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>  This email may relate to or be sent from other members of the Barclays
> Group.
> _______________________________________________
>

Re: hbase bulk writes

Reply via email to