Re: Use cases of HBase

Hua Su Wed, 10 Mar 2010 01:57:59 -0800

thank you for your kind explaination!

On Wed, Mar 10, 2010 at 5:36 PM, Andrew Purtell <apurt...@apache.org> wrote:


> > Here what's the exactly the meaning of "materialized"? Would you
> > kindly give more details?
>
> Basically what I am saying is the analytic computation can produce a table
> of a set of answers to questions which may be asked at some future time.
> Since HBase 0.20.0, random access to table data is of low enough latency to
> host the information directly. So, typically a batch process of user
> construction will run using TableInputFormat over raw data and output cooked
> results via TableOutputFormat into a table for answering queries later in
> real time. Depending on the use case this is usually either called
> precomputation or materialization. Precomputation is a generic term.
> Materialization (as in "materialized views") I believe was coined by Oracle.
> These terms are used interchangeably to refer the process of making answers
> to a set of possible queries in advance. To be pedantic I should have said
> precomputation instead of materialization, because the latter implies
> occasional automatic update of the cached data by the database engine. Of
> course HBase
>  does not do that.
>
> Hope that helped,
>
>   - Andy
>
>
>
> ----- Original Message ----
> > From: Hua Su <huas...@gmail.com>
> > To: hbase-user@hadoop.apache.org
> > Sent: Wed, March 10, 2010 1:01:33 AM
> > Subject: Re: Use cases of HBase
> >
> > Hi Purtel,
> >
> > What do you mean by "Since 0.20.0, results of analytic computations over
> the
> > data can be materialized and served out in real time in response to
> > queries."? Here what's the exactly the meaning of "materialized"? Would
> you
> > kindly give more details?
> >
> > Thanks!
> >
> > - Hua
> >
> > On Wed, Mar 10, 2010 at 8:12 AM, Andrew Purtell wrote:
> >
> > > I came to this discussion late.
> > >
> > > Ryan and J-D's use case is clearly successful.
> > >
> > > In addition to what others have said, I think another case where HBase
> > > really excels is supporting analytics over Big Data (which I define as
> on
> > > the order of petabyte). Some of the best performance numbers are put up
> by
> > > scanners. There is tight integration with the Hadoop MapReduce
> framework,
> > > not only in terms of API support but also with respect to efficient
> task
> > > distribution over the cluster -- moving computation to data -- and
> there is
> > > a favorable interaction with HDFS's location aware data placement.
> Moving
> > > computation to data like that is one major reason how analytics using
> the
> > > MapReduce paradigm can put conventional RDBMS/data warehouses to shame
> for
> > > substantially less cost. Since 0.20.0, results of analytic computations
> over
> > > the data can be materialized and served out in real time in response to
> > > queries. This is a complete solution.
> > >
> >
> >
> >
> >
> > >
> > >   - Andy
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: Ryan Rawson
> > > > To: hbase-user@hadoop.apache.org
> > > > Sent: Tue, March 9, 2010 3:34:55 PM
> > > > Subject: Re: Use cases of HBase
> > > >
> > > > HBase operates more like a write-thru cache.  Recent writes are in
> > > > memory (aka memstore).  Older data is in the block cache (by default
> > > > 20% of Xmx).  While you can rely on os buffering, you also want a
> > > > generous helping of block caching directly in HBase's regionserver.
> > > > We are seeing great performance, and our 95th percentiles seem to be
> > > > related to GC pauses.
> > > >
> > > > So to answer your use case below, the answer is most decidedly 'yes'.
> > > > Recent values are in memory, also read from memory as well.
> > > >
> > > > -ryan
> > > >
> > > > On Tue, Mar 9, 2010 at 3:12 PM, Charles Woerner
> > > > wrote:
> > > > > Ryan, your confidence has me interested in exploring HBase a bit
> > > further for
> > > > > some real-time functionality that we're building out. One question
> > > about the
> > > > > mem-caching functionality in HBase... Is it write-through or
> write-back
> > > such
> > > > > that all frequently written items are likely in memory, or is it
> > > > > pull-through via a client query? Or would I be relying on lower
> level
> > > > > caching features of the OS and underlying filesystem? In other
> words,
> > > where
> > > > > there are a high number of both reads and writes, and where 90% of
> all
> > > the
> > > > > reads are on recently (5 minutes) written datums would the HBase
> > > > > architecture help ensure that the most recently written data is
> already
> > > in
> > > > > the cache?
> > > > >
> > > > > On Tue, Mar 9, 2010 at 2:29 PM, Ryan Rawson wrote:
> > > > >
> > > > >> One thing to note is that 10GB is half the memory of a reasonable
> > > > >> sized machine. In fact I have seen 128 GB memcache boxes out
> there.
> > > > >>
> > > > >> As for performance, I obviously feel HBase can be performant for
> real
> > > > >> time queries.  To get a consistent response you absolutely have to
> > > > >> have 95%+ caching in ram. There is no way to achieve 1-2ms
> responses
> > > > >> from disk. Throwing enough ram at the problem, I think HBase
> solves
> > > > >> this nicely and you won't have to maintain multiple architectures.
> > > > >>
> > > > >> -ryan
> > > > >>
> > > > >> On Tue, Mar 9, 2010 at 2:08 PM, Jonathan Gray wrote:
> > > > >> > Brian,
> > > > >> >
> > > > >> > I would just reiterate what others have said.  If you're goal is
> a
> > > > >> > consistent 1-2ms read latency and your dataset is on the order
> of
> > > 10GB...
> > > > >> > HBase is not a good match.  It's more than what you need and
> you'll
> > > take
> > > > >> > unnecessary performance hits.
> > > > >> >
> > > > >> > I would look at some of the simpler KV-style stores out there
> like
> > > Tokyo
> > > > >> > Cabinet, Memcached, or BerkeleyDB, the in-memory ones like
> Redis.
> > > > >> >
> > > > >> > JG
> > > > >> >
> > > > >> > -----Original Message-----
> > > > >> > From: jaxzin [mailto:brian.r.jack...@espn3.com]
> > > > >> > Sent: Tuesday, March 09, 2010 12:09 PM
> > > > >> > To: hbase-user@hadoop.apache.org
> > > > >> > Subject: Re: Use cases of HBase
> > > > >> >
> > > > >> >
> > > > >> > Gary, I looked at your presentation and it was very helpful.
>  But I
> > > do
> > > > >> have
> > > > >> > a
> > > > >> > few unanswered questions from it if you wouldn't mind answering
> > > them.
> > > > >> How
> > > > >> > big is/was your cluster that handled 3k req/sec?  And what were
> the
> > > specs
> > > > >> on
> > > > >> > each node (RAM/CPU)?
> > > > >> >
> > > > >> > When you say latency can be good, what you mean?  Is it even in
> the
> > > > >> ballpark
> > > > >> > of 1 ms?  Because we already deal with the GC and don't expect
> > > perfect
> > > > >> > real-time behavior.  So that might be okay with me.
> > > > >> >
> > > > >> > P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's
> > > presentation
> > > > >> > there but somehow mentally blocked it.  Thanks for the reminder.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Gary Helmling wrote:
> > > > >> >>
> > > > >> >> Hey Brian,
> > > > >> >>
> > > > >> >> We use HBase to complement MySQL in serving activity-stream
> type
> > > data
> > > > >> here
> > > > >> >> at Meetup.  It's handling real-time requests involved in 20-25%
> of
> > > our
> > > > >> >> page
> > > > >> >> views, but our latency requirements aren't as strict as yours.
>  For
> > > what
> > > > >> >> it's worth, I did a presentation on our setup which will
> hopefully
> > > fill
> > > > >> in
> > > > >> >> some details:
> http://www.slideshare.net/ghelmling/hbase-at-meetup
> > > > >> >>
> > > > >> >> There are also some great presentations by Ryan Rawson and
> Jonathan
> > > Gray
> > > > >> >> on
> > > > >> >> how they've used HBase for realtime serving on their sites.
>  See
> > > the
> > > > >> >> presentations wiki page:
> > > > >> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations
> > > > >> >>
> > > > >> >> Like Barney, I suspect where you'll hit some issues will be in
> your
> > > > >> >> latency
> > > > >> >> requirements.  Depending on how you layout your data and
> configure
> > > your
> > > > >> >> column families, your average latency may be good, but you will
> hit
> > > some
> > > > >> >> pauses as I believe reads block at times during region splits
> or
> > > > >> >> compactions
> > > > >> >> and memstore flushes (unless you have a fairly static data
> set).
> > >  Others
> > > > >> >> here should be able to fill in more details.
> > > > >> >>
> > > > >> >> With a relatively small dataset, you may want to look at the
> "in
> > > memory"
> > > > >> >> configuration option for your column families.
> > > > >> >>
> > > > >> >> What's your expected workload -- writes vs. reads?  types of
> reads
> > > > >> you'll
> > > > >> >> be
> > > > >> >> doing: random access vs. sequential?  There are a lot of
> > > knowledgeable
> > > > >> >> folks
> > > > >> >> here to offer advice if you can give us some more insight into
> what
> > > > >> you're
> > > > >> >> trying to build.
> > > > >> >>
> > > > >> >> --gh
> > > > >> >>
> > > > >> >>
> > > > >> >> On Tue, Mar 9, 2010 at 11:21 AM, jaxzin
> > > > >> wrote:
> > > > >> >>
> > > > >> >>>
> > > > >> >>> This is exactly the kind of feedback I'm looking for thanks,
> > > Barney.
> > > > >> >>>
> > > > >> >>> So its sounds like you cache the data you get from HBase in a
> > > > >> >>> session-based
> > > > >> >>> memory?  Are you using a Java EE HttpSession? (I'm less
> familiar
> > > with
> > > > >> >>> django/rails equivalent but I'm assuming they exist)  Or are
> you
> > > using
> > > > >> a
> > > > >> >>> memory cache provider like ehcache or memcache(d)?
> > > > >> >>>
> > > > >> >>> Can you tell me more about your experience with latency and
> why
> > > you say
> > > > >> >>> that?
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> Barney Frank wrote:
> > > > >> >>> >
> > > > >> >>> > I am using Hbase to store visitor level clickstream-like
> data.
> > >  At
> > > > >> the
> > > > >> >>> > beginning of the visitor session I retrieve all the previous
> > > session
> > > > >> >>> data
> > > > >> >>> > from hbase and use it within my app server and massage it a
> > > little
> > > > >> and
> > > > >> >>> > serve
> > > > >> >>> > to the consumer via web services.  Where I think you will
> run
> > > into
> > > > >> the
> > > > >> >>> > most
> > > > >> >>> > problems is your latency requirement.
> > > > >> >>> >
> > > > >> >>> > Just my 2 cents from a user.
> > > > >> >>> >
> > > > >> >>> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin
> > > > >> >>> wrote:
> > > > >> >>> >
> > > > >> >>> >>
> > > > >> >>> >> Hi all, I've got a question about how everyone is using
> HBase.
> > >  Is
> > > > >> >>> anyone
> > > > >> >>> >> using its as online data store to directly back a web
> service?
> > > > >> >>> >>
> > > > >> >>> >> The text-book example of a weblink HBase table suggests
> there
> > > would
> > > > >> be
> > > > >> >>> an
> > > > >> >>> >> associated web front-end to display the information in that
> > > HBase
> > > > >> >>> table
> > > > >> >>> >> (ex.
> > > > >> >>> >> search results page), but I'm having trouble finding
> evidence
> > > that
> > > > >> >>> anyone
> > > > >> >>> >> is
> > > > >> >>> >> servicing web traffic backed directly by an HBase instance
> in
> > > > >> >>> practice.
> > > > >> >>> >>
> > > > >> >>> >> I'm evaluating if HBase would be the right tool to provide
> a
> > > few
> > > > >> >>> things
> > > > >> >>> >> for
> > > > >> >>> >> a large-scale web service we want to develop at ESPN and
> I'd
> > > really
> > > > >> >>> like
> > > > >> >>> >> to
> > > > >> >>> >> get opinions and experience from people who have already
> been
> > > down
> > > > >> >>> this
> > > > >> >>> >> path.  No need to reinvent the wheel, right?
> > > > >> >>> >>
> > > > >> >>> >> I can tell you a little about the project goals if it helps
> > > give you
> > > > >> >>> an
> > > > >> >>> >> idea
> > > > >> >>> >> of what I'm trying to design for:
> > > > >> >>> >>
> > > > >> >>> >> 1) Highly available (It would be a central service and an
> > > outage
> > > > >> would
> > > > >> >>> >> take
> > > > >> >>> >> down everything)
> > > > >> >>> >> 2) Low latency (1-2 ms, less is better, more isn't
> acceptable)
> > > > >> >>> >> 3) High throughput (5-10k req/sec at worse case peak)
> > > > >> >>> >> 4) Unstable traffic (ex. Sunday afternoons during football
> > > season)
> > > > >> >>> >> 5) Small data...for now (< 10 GB of total data currently,
> but
> > > HBase
> > > > >> >>> could
> > > > >> >>> >> allow us to design differently and store more online)
> > > > >> >>> >>
> > > > >> >>> >> The reason I'm looking at HBase is that we've solved many
> of
> > > our
> > > > >> >>> scaling
> > > > >> >>> >> issues with the same basic concepts of HBase (sharding,
> > > flattening
> > > > >> >>> data
> > > > >> >>> >> to
> > > > >> >>> >> fit in one row, throw away ACID, etc) but with home-grown
> > > software.
> > > > >> >>> I'd
> > > > >> >>> >> like to adopt an active open-source project if it makes
> sense.
> > > > >> >>> >>
> > > > >> >>> >> Alternatives I'm also looking at: RDBMS fronted with
> Websphere
> > > > >> eXtreme
> > > > >> >>> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option
> I
> > > > >> >>> understand
> > > > >> >>> >> the
> > > > >> >>> >> least right now) memcached.
> > > > >> >>> >>
> > > > >> >>> >> Thanks,
> > > > >> >>> >> Brian
> > > > >> >>> >> --
> > > > >> >>> >> View this message in context:
> > > > >> >>> >>
> > > http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html
> > > > >> >>> >> Sent from the HBase User mailing list archive at
> Nabble.com.
> > > > >> >>> >>
> > > > >> >>> >>
> > > > >> >>> >
> > > > >> >>> >
> > > > >> >>>
> > > > >> >>> --
> > > > >> >>> View this message in context:
> > > > >> >>>
> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html
> > > > >> >>> Sent from the HBase User mailing list archive at Nabble.com.
> > > > >> >>>
> > > > >> >>>
> > > > >> >>
> > > > >> >>
> > > > >> >
> > > > >> > --
> > > > >> > View this message in context:
> > > > >> >
> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html
> > > > >> > Sent from the HBase User mailing list archive at Nabble.com.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ---
> > > > > Thanks,
> > > > >
> > > > > Charles Woerner
> > > > >
> > >
> > >
> > >
> > >
> > >
> > >
>
>
>
>
>
>

Re: Use cases of HBase

Reply via email to