thank you for your kind explaination! On Wed, Mar 10, 2010 at 5:36 PM, Andrew Purtell <apurt...@apache.org> wrote:
> > Here what's the exactly the meaning of "materialized"? Would you > > kindly give more details? > > Basically what I am saying is the analytic computation can produce a table > of a set of answers to questions which may be asked at some future time. > Since HBase 0.20.0, random access to table data is of low enough latency to > host the information directly. So, typically a batch process of user > construction will run using TableInputFormat over raw data and output cooked > results via TableOutputFormat into a table for answering queries later in > real time. Depending on the use case this is usually either called > precomputation or materialization. Precomputation is a generic term. > Materialization (as in "materialized views") I believe was coined by Oracle. > These terms are used interchangeably to refer the process of making answers > to a set of possible queries in advance. To be pedantic I should have said > precomputation instead of materialization, because the latter implies > occasional automatic update of the cached data by the database engine. Of > course HBase > does not do that. > > Hope that helped, > > - Andy > > > > ----- Original Message ---- > > From: Hua Su <huas...@gmail.com> > > To: hbase-user@hadoop.apache.org > > Sent: Wed, March 10, 2010 1:01:33 AM > > Subject: Re: Use cases of HBase > > > > Hi Purtel, > > > > What do you mean by "Since 0.20.0, results of analytic computations over > the > > data can be materialized and served out in real time in response to > > queries."? Here what's the exactly the meaning of "materialized"? Would > you > > kindly give more details? > > > > Thanks! > > > > - Hua > > > > On Wed, Mar 10, 2010 at 8:12 AM, Andrew Purtell wrote: > > > > > I came to this discussion late. > > > > > > Ryan and J-D's use case is clearly successful. > > > > > > In addition to what others have said, I think another case where HBase > > > really excels is supporting analytics over Big Data (which I define as > on > > > the order of petabyte). Some of the best performance numbers are put up > by > > > scanners. There is tight integration with the Hadoop MapReduce > framework, > > > not only in terms of API support but also with respect to efficient > task > > > distribution over the cluster -- moving computation to data -- and > there is > > > a favorable interaction with HDFS's location aware data placement. > Moving > > > computation to data like that is one major reason how analytics using > the > > > MapReduce paradigm can put conventional RDBMS/data warehouses to shame > for > > > substantially less cost. Since 0.20.0, results of analytic computations > over > > > the data can be materialized and served out in real time in response to > > > queries. This is a complete solution. > > > > > > > > > > > > > > > > > - Andy > > > > > > > > > > > > ----- Original Message ---- > > > > From: Ryan Rawson > > > > To: hbase-user@hadoop.apache.org > > > > Sent: Tue, March 9, 2010 3:34:55 PM > > > > Subject: Re: Use cases of HBase > > > > > > > > HBase operates more like a write-thru cache. Recent writes are in > > > > memory (aka memstore). Older data is in the block cache (by default > > > > 20% of Xmx). While you can rely on os buffering, you also want a > > > > generous helping of block caching directly in HBase's regionserver. > > > > We are seeing great performance, and our 95th percentiles seem to be > > > > related to GC pauses. > > > > > > > > So to answer your use case below, the answer is most decidedly 'yes'. > > > > Recent values are in memory, also read from memory as well. > > > > > > > > -ryan > > > > > > > > On Tue, Mar 9, 2010 at 3:12 PM, Charles Woerner > > > > wrote: > > > > > Ryan, your confidence has me interested in exploring HBase a bit > > > further for > > > > > some real-time functionality that we're building out. One question > > > about the > > > > > mem-caching functionality in HBase... Is it write-through or > write-back > > > such > > > > > that all frequently written items are likely in memory, or is it > > > > > pull-through via a client query? Or would I be relying on lower > level > > > > > caching features of the OS and underlying filesystem? In other > words, > > > where > > > > > there are a high number of both reads and writes, and where 90% of > all > > > the > > > > > reads are on recently (5 minutes) written datums would the HBase > > > > > architecture help ensure that the most recently written data is > already > > > in > > > > > the cache? > > > > > > > > > > On Tue, Mar 9, 2010 at 2:29 PM, Ryan Rawson wrote: > > > > > > > > > >> One thing to note is that 10GB is half the memory of a reasonable > > > > >> sized machine. In fact I have seen 128 GB memcache boxes out > there. > > > > >> > > > > >> As for performance, I obviously feel HBase can be performant for > real > > > > >> time queries. To get a consistent response you absolutely have to > > > > >> have 95%+ caching in ram. There is no way to achieve 1-2ms > responses > > > > >> from disk. Throwing enough ram at the problem, I think HBase > solves > > > > >> this nicely and you won't have to maintain multiple architectures. > > > > >> > > > > >> -ryan > > > > >> > > > > >> On Tue, Mar 9, 2010 at 2:08 PM, Jonathan Gray wrote: > > > > >> > Brian, > > > > >> > > > > > >> > I would just reiterate what others have said. If you're goal is > a > > > > >> > consistent 1-2ms read latency and your dataset is on the order > of > > > 10GB... > > > > >> > HBase is not a good match. It's more than what you need and > you'll > > > take > > > > >> > unnecessary performance hits. > > > > >> > > > > > >> > I would look at some of the simpler KV-style stores out there > like > > > Tokyo > > > > >> > Cabinet, Memcached, or BerkeleyDB, the in-memory ones like > Redis. > > > > >> > > > > > >> > JG > > > > >> > > > > > >> > -----Original Message----- > > > > >> > From: jaxzin [mailto:brian.r.jack...@espn3.com] > > > > >> > Sent: Tuesday, March 09, 2010 12:09 PM > > > > >> > To: hbase-user@hadoop.apache.org > > > > >> > Subject: Re: Use cases of HBase > > > > >> > > > > > >> > > > > > >> > Gary, I looked at your presentation and it was very helpful. > But I > > > do > > > > >> have > > > > >> > a > > > > >> > few unanswered questions from it if you wouldn't mind answering > > > them. > > > > >> How > > > > >> > big is/was your cluster that handled 3k req/sec? And what were > the > > > specs > > > > >> on > > > > >> > each node (RAM/CPU)? > > > > >> > > > > > >> > When you say latency can be good, what you mean? Is it even in > the > > > > >> ballpark > > > > >> > of 1 ms? Because we already deal with the GC and don't expect > > > perfect > > > > >> > real-time behavior. So that might be okay with me. > > > > >> > > > > > >> > P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's > > > presentation > > > > >> > there but somehow mentally blocked it. Thanks for the reminder. > > > > >> > > > > > >> > > > > > >> > > > > > >> > Gary Helmling wrote: > > > > >> >> > > > > >> >> Hey Brian, > > > > >> >> > > > > >> >> We use HBase to complement MySQL in serving activity-stream > type > > > data > > > > >> here > > > > >> >> at Meetup. It's handling real-time requests involved in 20-25% > of > > > our > > > > >> >> page > > > > >> >> views, but our latency requirements aren't as strict as yours. > For > > > what > > > > >> >> it's worth, I did a presentation on our setup which will > hopefully > > > fill > > > > >> in > > > > >> >> some details: > http://www.slideshare.net/ghelmling/hbase-at-meetup > > > > >> >> > > > > >> >> There are also some great presentations by Ryan Rawson and > Jonathan > > > Gray > > > > >> >> on > > > > >> >> how they've used HBase for realtime serving on their sites. > See > > > the > > > > >> >> presentations wiki page: > > > > >> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations > > > > >> >> > > > > >> >> Like Barney, I suspect where you'll hit some issues will be in > your > > > > >> >> latency > > > > >> >> requirements. Depending on how you layout your data and > configure > > > your > > > > >> >> column families, your average latency may be good, but you will > hit > > > some > > > > >> >> pauses as I believe reads block at times during region splits > or > > > > >> >> compactions > > > > >> >> and memstore flushes (unless you have a fairly static data > set). > > > Others > > > > >> >> here should be able to fill in more details. > > > > >> >> > > > > >> >> With a relatively small dataset, you may want to look at the > "in > > > memory" > > > > >> >> configuration option for your column families. > > > > >> >> > > > > >> >> What's your expected workload -- writes vs. reads? types of > reads > > > > >> you'll > > > > >> >> be > > > > >> >> doing: random access vs. sequential? There are a lot of > > > knowledgeable > > > > >> >> folks > > > > >> >> here to offer advice if you can give us some more insight into > what > > > > >> you're > > > > >> >> trying to build. > > > > >> >> > > > > >> >> --gh > > > > >> >> > > > > >> >> > > > > >> >> On Tue, Mar 9, 2010 at 11:21 AM, jaxzin > > > > >> wrote: > > > > >> >> > > > > >> >>> > > > > >> >>> This is exactly the kind of feedback I'm looking for thanks, > > > Barney. > > > > >> >>> > > > > >> >>> So its sounds like you cache the data you get from HBase in a > > > > >> >>> session-based > > > > >> >>> memory? Are you using a Java EE HttpSession? (I'm less > familiar > > > with > > > > >> >>> django/rails equivalent but I'm assuming they exist) Or are > you > > > using > > > > >> a > > > > >> >>> memory cache provider like ehcache or memcache(d)? > > > > >> >>> > > > > >> >>> Can you tell me more about your experience with latency and > why > > > you say > > > > >> >>> that? > > > > >> >>> > > > > >> >>> > > > > >> >>> Barney Frank wrote: > > > > >> >>> > > > > > >> >>> > I am using Hbase to store visitor level clickstream-like > data. > > > At > > > > >> the > > > > >> >>> > beginning of the visitor session I retrieve all the previous > > > session > > > > >> >>> data > > > > >> >>> > from hbase and use it within my app server and massage it a > > > little > > > > >> and > > > > >> >>> > serve > > > > >> >>> > to the consumer via web services. Where I think you will > run > > > into > > > > >> the > > > > >> >>> > most > > > > >> >>> > problems is your latency requirement. > > > > >> >>> > > > > > >> >>> > Just my 2 cents from a user. > > > > >> >>> > > > > > >> >>> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin > > > > >> >>> wrote: > > > > >> >>> > > > > > >> >>> >> > > > > >> >>> >> Hi all, I've got a question about how everyone is using > HBase. > > > Is > > > > >> >>> anyone > > > > >> >>> >> using its as online data store to directly back a web > service? > > > > >> >>> >> > > > > >> >>> >> The text-book example of a weblink HBase table suggests > there > > > would > > > > >> be > > > > >> >>> an > > > > >> >>> >> associated web front-end to display the information in that > > > HBase > > > > >> >>> table > > > > >> >>> >> (ex. > > > > >> >>> >> search results page), but I'm having trouble finding > evidence > > > that > > > > >> >>> anyone > > > > >> >>> >> is > > > > >> >>> >> servicing web traffic backed directly by an HBase instance > in > > > > >> >>> practice. > > > > >> >>> >> > > > > >> >>> >> I'm evaluating if HBase would be the right tool to provide > a > > > few > > > > >> >>> things > > > > >> >>> >> for > > > > >> >>> >> a large-scale web service we want to develop at ESPN and > I'd > > > really > > > > >> >>> like > > > > >> >>> >> to > > > > >> >>> >> get opinions and experience from people who have already > been > > > down > > > > >> >>> this > > > > >> >>> >> path. No need to reinvent the wheel, right? > > > > >> >>> >> > > > > >> >>> >> I can tell you a little about the project goals if it helps > > > give you > > > > >> >>> an > > > > >> >>> >> idea > > > > >> >>> >> of what I'm trying to design for: > > > > >> >>> >> > > > > >> >>> >> 1) Highly available (It would be a central service and an > > > outage > > > > >> would > > > > >> >>> >> take > > > > >> >>> >> down everything) > > > > >> >>> >> 2) Low latency (1-2 ms, less is better, more isn't > acceptable) > > > > >> >>> >> 3) High throughput (5-10k req/sec at worse case peak) > > > > >> >>> >> 4) Unstable traffic (ex. Sunday afternoons during football > > > season) > > > > >> >>> >> 5) Small data...for now (< 10 GB of total data currently, > but > > > HBase > > > > >> >>> could > > > > >> >>> >> allow us to design differently and store more online) > > > > >> >>> >> > > > > >> >>> >> The reason I'm looking at HBase is that we've solved many > of > > > our > > > > >> >>> scaling > > > > >> >>> >> issues with the same basic concepts of HBase (sharding, > > > flattening > > > > >> >>> data > > > > >> >>> >> to > > > > >> >>> >> fit in one row, throw away ACID, etc) but with home-grown > > > software. > > > > >> >>> I'd > > > > >> >>> >> like to adopt an active open-source project if it makes > sense. > > > > >> >>> >> > > > > >> >>> >> Alternatives I'm also looking at: RDBMS fronted with > Websphere > > > > >> eXtreme > > > > >> >>> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option > I > > > > >> >>> understand > > > > >> >>> >> the > > > > >> >>> >> least right now) memcached. > > > > >> >>> >> > > > > >> >>> >> Thanks, > > > > >> >>> >> Brian > > > > >> >>> >> -- > > > > >> >>> >> View this message in context: > > > > >> >>> >> > > > http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html > > > > >> >>> >> Sent from the HBase User mailing list archive at > Nabble.com. > > > > >> >>> >> > > > > >> >>> >> > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > >> >>> -- > > > > >> >>> View this message in context: > > > > >> >>> > http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html > > > > >> >>> Sent from the HBase User mailing list archive at Nabble.com. > > > > >> >>> > > > > >> >>> > > > > >> >> > > > > >> >> > > > > >> > > > > > >> > -- > > > > >> > View this message in context: > > > > >> > > http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html > > > > >> > Sent from the HBase User mailing list archive at Nabble.com. > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > --- > > > > > Thanks, > > > > > > > > > > Charles Woerner > > > > > > > > > > > > > > > > > > > > > > > > > > > > >