HBase operates more like a write-thru cache.  Recent writes are in
memory (aka memstore).  Older data is in the block cache (by default
20% of Xmx).  While you can rely on os buffering, you also want a
generous helping of block caching directly in HBase's regionserver.
We are seeing great performance, and our 95th percentiles seem to be
related to GC pauses.

So to answer your use case below, the answer is most decidedly 'yes'.
Recent values are in memory, also read from memory as well.

-ryan

On Tue, Mar 9, 2010 at 3:12 PM, Charles Woerner
<charleswoer...@gmail.com> wrote:
> Ryan, your confidence has me interested in exploring HBase a bit further for
> some real-time functionality that we're building out. One question about the
> mem-caching functionality in HBase... Is it write-through or write-back such
> that all frequently written items are likely in memory, or is it
> pull-through via a client query? Or would I be relying on lower level
> caching features of the OS and underlying filesystem? In other words, where
> there are a high number of both reads and writes, and where 90% of all the
> reads are on recently (5 minutes) written datums would the HBase
> architecture help ensure that the most recently written data is already in
> the cache?
>
> On Tue, Mar 9, 2010 at 2:29 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>
>> One thing to note is that 10GB is half the memory of a reasonable
>> sized machine. In fact I have seen 128 GB memcache boxes out there.
>>
>> As for performance, I obviously feel HBase can be performant for real
>> time queries.  To get a consistent response you absolutely have to
>> have 95%+ caching in ram. There is no way to achieve 1-2ms responses
>> from disk. Throwing enough ram at the problem, I think HBase solves
>> this nicely and you won't have to maintain multiple architectures.
>>
>> -ryan
>>
>> On Tue, Mar 9, 2010 at 2:08 PM, Jonathan Gray <jl...@streamy.com> wrote:
>> > Brian,
>> >
>> > I would just reiterate what others have said.  If you're goal is a
>> > consistent 1-2ms read latency and your dataset is on the order of 10GB...
>> > HBase is not a good match.  It's more than what you need and you'll take
>> > unnecessary performance hits.
>> >
>> > I would look at some of the simpler KV-style stores out there like Tokyo
>> > Cabinet, Memcached, or BerkeleyDB, the in-memory ones like Redis.
>> >
>> > JG
>> >
>> > -----Original Message-----
>> > From: jaxzin [mailto:brian.r.jack...@espn3.com]
>> > Sent: Tuesday, March 09, 2010 12:09 PM
>> > To: hbase-user@hadoop.apache.org
>> > Subject: Re: Use cases of HBase
>> >
>> >
>> > Gary, I looked at your presentation and it was very helpful.  But I do
>> have
>> > a
>> > few unanswered questions from it if you wouldn't mind answering them.
>> How
>> > big is/was your cluster that handled 3k req/sec?  And what were the specs
>> on
>> > each node (RAM/CPU)?
>> >
>> > When you say latency can be good, what you mean?  Is it even in the
>> ballpark
>> > of 1 ms?  Because we already deal with the GC and don't expect perfect
>> > real-time behavior.  So that might be okay with me.
>> >
>> > P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's presentation
>> > there but somehow mentally blocked it.  Thanks for the reminder.
>> >
>> >
>> >
>> > Gary Helmling wrote:
>> >>
>> >> Hey Brian,
>> >>
>> >> We use HBase to complement MySQL in serving activity-stream type data
>> here
>> >> at Meetup.  It's handling real-time requests involved in 20-25% of our
>> >> page
>> >> views, but our latency requirements aren't as strict as yours.  For what
>> >> it's worth, I did a presentation on our setup which will hopefully fill
>> in
>> >> some details: http://www.slideshare.net/ghelmling/hbase-at-meetup
>> >>
>> >> There are also some great presentations by Ryan Rawson and Jonathan Gray
>> >> on
>> >> how they've used HBase for realtime serving on their sites.  See the
>> >> presentations wiki page:
>> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations
>> >>
>> >> Like Barney, I suspect where you'll hit some issues will be in your
>> >> latency
>> >> requirements.  Depending on how you layout your data and configure your
>> >> column families, your average latency may be good, but you will hit some
>> >> pauses as I believe reads block at times during region splits or
>> >> compactions
>> >> and memstore flushes (unless you have a fairly static data set).  Others
>> >> here should be able to fill in more details.
>> >>
>> >> With a relatively small dataset, you may want to look at the "in memory"
>> >> configuration option for your column families.
>> >>
>> >> What's your expected workload -- writes vs. reads?  types of reads
>> you'll
>> >> be
>> >> doing: random access vs. sequential?  There are a lot of knowledgeable
>> >> folks
>> >> here to offer advice if you can give us some more insight into what
>> you're
>> >> trying to build.
>> >>
>> >> --gh
>> >>
>> >>
>> >> On Tue, Mar 9, 2010 at 11:21 AM, jaxzin <brian.r.jack...@espn3.com>
>> wrote:
>> >>
>> >>>
>> >>> This is exactly the kind of feedback I'm looking for thanks, Barney.
>> >>>
>> >>> So its sounds like you cache the data you get from HBase in a
>> >>> session-based
>> >>> memory?  Are you using a Java EE HttpSession? (I'm less familiar with
>> >>> django/rails equivalent but I'm assuming they exist)  Or are you using
>> a
>> >>> memory cache provider like ehcache or memcache(d)?
>> >>>
>> >>> Can you tell me more about your experience with latency and why you say
>> >>> that?
>> >>>
>> >>>
>> >>> Barney Frank wrote:
>> >>> >
>> >>> > I am using Hbase to store visitor level clickstream-like data.  At
>> the
>> >>> > beginning of the visitor session I retrieve all the previous session
>> >>> data
>> >>> > from hbase and use it within my app server and massage it a little
>> and
>> >>> > serve
>> >>> > to the consumer via web services.  Where I think you will run into
>> the
>> >>> > most
>> >>> > problems is your latency requirement.
>> >>> >
>> >>> > Just my 2 cents from a user.
>> >>> >
>> >>> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin <brian.r.jack...@espn3.com>
>> >>> wrote:
>> >>> >
>> >>> >>
>> >>> >> Hi all, I've got a question about how everyone is using HBase.  Is
>> >>> anyone
>> >>> >> using its as online data store to directly back a web service?
>> >>> >>
>> >>> >> The text-book example of a weblink HBase table suggests there would
>> be
>> >>> an
>> >>> >> associated web front-end to display the information in that HBase
>> >>> table
>> >>> >> (ex.
>> >>> >> search results page), but I'm having trouble finding evidence that
>> >>> anyone
>> >>> >> is
>> >>> >> servicing web traffic backed directly by an HBase instance in
>> >>> practice.
>> >>> >>
>> >>> >> I'm evaluating if HBase would be the right tool to provide a few
>> >>> things
>> >>> >> for
>> >>> >> a large-scale web service we want to develop at ESPN and I'd really
>> >>> like
>> >>> >> to
>> >>> >> get opinions and experience from people who have already been down
>> >>> this
>> >>> >> path.  No need to reinvent the wheel, right?
>> >>> >>
>> >>> >> I can tell you a little about the project goals if it helps give you
>> >>> an
>> >>> >> idea
>> >>> >> of what I'm trying to design for:
>> >>> >>
>> >>> >> 1) Highly available (It would be a central service and an outage
>> would
>> >>> >> take
>> >>> >> down everything)
>> >>> >> 2) Low latency (1-2 ms, less is better, more isn't acceptable)
>> >>> >> 3) High throughput (5-10k req/sec at worse case peak)
>> >>> >> 4) Unstable traffic (ex. Sunday afternoons during football season)
>> >>> >> 5) Small data...for now (< 10 GB of total data currently, but HBase
>> >>> could
>> >>> >> allow us to design differently and store more online)
>> >>> >>
>> >>> >> The reason I'm looking at HBase is that we've solved many of our
>> >>> scaling
>> >>> >> issues with the same basic concepts of HBase (sharding, flattening
>> >>> data
>> >>> >> to
>> >>> >> fit in one row, throw away ACID, etc) but with home-grown software.
>> >>> I'd
>> >>> >> like to adopt an active open-source project if it makes sense.
>> >>> >>
>> >>> >> Alternatives I'm also looking at: RDBMS fronted with Websphere
>> eXtreme
>> >>> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option I
>> >>> understand
>> >>> >> the
>> >>> >> least right now) memcached.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Brian
>> >>> >> --
>> >>> >> View this message in context:
>> >>> >> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html
>> >>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html
>> >>> Sent from the HBase User mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>
>> >>
>> >
>> > --
>> > View this message in context:
>> > http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html
>> > Sent from the HBase User mailing list archive at Nabble.com.
>> >
>> >
>> >
>>
>
>
>
> --
> ---
> Thanks,
>
> Charles Woerner
>

Reply via email to