Re: Why is scaling HBase much simpler then scaling a relational db?

Mork0075 Wed, 27 Aug 2008 23:46:47 -0700

Thank you very much for your effort!

> So it really depends on what you want to use it for.  If you're
> thinking about it, you probably have some kind of scale issues.


Not at the moment. Actually our software runs on a single server, web
server/database/file storage/lucene side by side. But we're planing to
create a structured approach, on how to scale in future. For the
database layer there're options like Master/Slave, Master/Master,
Clustering, Sharding AND HBase. HBase offers unlimited scaling
opportunities, with some drawbacks compared to relational dbs at the
moment. One of our considerations is "stay with RDBMS and perhaps got
some huge scaling issues in the future or decide for HBase and PERHAPS
benefit from it in the future". What do you mean, is this the right way
to think about the whole topic?

As i can read, companies like Flickr, Digg and so on, put huge effort in
sharding their databases. Would HBase be suitable for this scenario and
would it had saved the trouble?

> Again, typical small web apps are far simpler to write using SQL
> When building large scale web applications, this is what you want.

and thats the problem :) You don't know when and if your web
applications becomes popular/large. As a computer scientist you're
always anxious to keep the future (growings) in mind (and even more if
it "costs" that less then HBase)

Jonathan Gray schrieb:
> Discussion inline.
> 
>> You example with the friends makes perfectly sense. Can you imagine a
>> scenario where storing the data in column oriented instead of row
>> oriented db (so if you will an counterexample) causes such a huge
>> performance mismatch, like the friends one in row/column comparison?
> 
> There's quite a few advantages in a typical row-oriented database.  For one, 
> normalization is more space efficient, though this is increasingly less 
> important, especially in distributed file systems and with drives as cheap as 
> they are.
> 
> A big difference you'll find between organizations running large relational 
> clusters compared to large hbase/hadoop clusters is the type of hardware 
> used.  Relational databases can (and should) be memory hogs but worst of all 
> they need fast disk i/o.  Large 15k rpm SAS RAID-10 arrays are often used 
> costing tens of thousands of dollars.  Each server might have 15 drives or 
> more, 8 or more cores, and 32 gigs of ram.
> 
> In contrast, HBase/Hadoop clusters are most often made with "commodity" 
> hardware, a decent processor (quad core xeon) and 4+ gb of memory in a 1U 
> runs about $1000.  The i/o matters much less with this architecture and 
> that's where most cost can be associated when purchasing relational database 
> hardware.
> 
> One tradeoff of course is that you will have far more actual machines with 
> HBase/Hadoop but a node going down is not a big deal whereas it can be a much 
> bigger emergency if the number of total nodes is low.
> 
> 
>>From a performance standpoint, there's no contest for relational databases 
>>and their ability to index on different columns and randomly access records.  
>>If you have dense data and you want to query it randomly or with ordering and 
>>limiting/offsetting (in soft realtime), you're going to be in for some tough 
>>times with HBase.  This is definitely where relational DBs shine.  
> 
> In HBase things like this require lots of denormalization and cleverness 
> (though many times table scans are the only way), or in my case writing a 
> separate logic/application/caching layer on top of HBase.  For us, HBase is a 
> store that backs our caches and indexes.  Those are allowed to fail as they 
> can be fully recovered from HBase and also implement LRU-like algorithms as 
> our total dataset is on the order of terabytes.  Some of this caching can be 
> compared to Memcache for SQL, but indexing/joining is unique.
> 
> So it really depends on what you want to use it for.  If you're thinking 
> about it, you probably have some kind of scale issues.  Either in the size of 
> your dataset, your need to process a very large dataset in batch and in 
> parallel, or strong need for replication/fault-tolerance.  Though it's also 
> good to just explore because many of us think this is the direction things 
> are going :)
> 
> And things will get better!  Currently, there is already work being done with 
> indexing.  I have personally created join/merge logic that will be 
> contributed in the future.  But the larger issue at hand is the relatively 
> poor random read performance of HBase compared to that of relational 
> databases.  This is inextricably linked to HDFS, which is most often tuned 
> for batch processing rather than random access (a vast majority of Hadoop 
> users are using it in this way).  However, improvements are definitely being 
> made within both HBase and Hadoop.  You can follow the performance statistics 
> here:  http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
> 
> Between 0.17 and 0.18, random read performance jumped 60%.  So things are 
> definitely getting better, there's many people working hard at both projects. 
>  I suspect we will also see in-memory tables in the next few months which can 
> help if you have only certain tables that need fast random access.
> 
> I think I got a bit off topic but hopefully you find something useful out of 
> it...
> 
> 
>> Can you please provide an example of "good de-normalization" in HBase
>> and how its held consitent (in your friends example in a relational db,
>> there would be a cascadingDelete)? As i think of the users table: if i
>> delete an user with the userid='123', then if have to walk through all
>> of the other users column-family "friends" to guranty consitency?! Is
>> de-normalization in HBase only used to avoid joins? Our webapp doenst
>> use joins at the moment anyway.
> 
> You lose any concept of foreign keys.  You have a primary key, that's it.  No 
> secondary keys/indexes, no foreign keys.
> 
> It's the responsibility of your application to handle something like deleting 
> a friend and cascading to the friendships.  Again, typical small web apps are 
> far simpler to write using SQL, you become responsible for some of the things 
> that were once handled for you.
> 
> When building large scale web applications, this is what you want.  Control.  
> Like programming in C versus Java (no flame war intended, I love them both), 
> that control comes at the cost of complexity.  But when you need to scale in 
> a relational database you so often end up hacking away at it, trying to trick 
> it to do what you want rather than letting it solve its own problems.  And 
> that's where they shine, solving your problems automatically via SQL/foreign 
> keys/query planning/etc.  Eventually the problems get too hard and things get 
> very slow.
> 
> Another example of "good denormalization" would be something like storing a 
> users "favorite pages".  If we want to query this data in two ways: for a 
> given user, all of his favorites.  Or, for a given favorite, all of the users 
> who have it as a favorite.  Relational database would probably have tables 
> for users, favorites, and userfavorites.  Each link would be stored in one 
> row in the userfavorites table.  We would have indexes on both 'userid' and 
> 'favoriteid' and could thus query it in both ways described above.  In HBase 
> we'd probably put a column in both the users table and the favorites table, 
> there would be no link table.
> 
> That would be a very efficient query in both architectures, with relational 
> performing better much better with small datasets but less so with a large 
> dataset.
> 
> Now asking for the favorites of these 10 users.  That starts to get tricky in 
> HBase and will undoubtedly suffer worse from random reading.  The flexibility 
> of SQL allows us to just ask the database for the answer to that question.  
> In a small dataset it will come up with a decent solution, and return the 
> results to you in a matter of milliseconds.  Now let's make that 
> userfavorites table a few billion rows, and the number of users you're asking 
> for a couple thousand.  The query planner will come up with something but 
> things will fall down and it will end up taking forever.  The worst problem 
> will be in the index bloat.  Insertions to this link table will start to take 
> a very long time.  HBase will perform virtually the same as it did on the 
> small table, if not better because of superior region distribution. 
> 
> And again, slow disks are cheap, fast disks are expensive.
> 
>> As you describe it, its a problem of implementation. BigTable is
>> designed to scale, there are routines to shard the data, desitribute
>> it to the pool of connected servers. Could MySQL perhaps decide
>> tomorrow to implement something similar or does the relational model
>> avoids this?
> 
> Distributed transactions and locking will always incur a cost.  Therefore 
> multi-master approaches will always contain at least some significant 
> performance hit from distribution.  Slave replication can be done very 
> efficiently and effectively but do not really address all the scale issues.  
> Yes, they could and are doing similar things.  But the faster it performs, 
> you more you will be sacrificing in terms of what you can take for granted 
> with single-node MySQL. 
> 
> One fundamental difference here is in storage.  BigTable stores sparse data 
> by row then by column without storing nulls.  MySQL doesn't do that.  You can 
> create link tables to store sparse matrix data but at the costs described 
> above.  And every column in your schema is stored in every row, regardless of 
> whether you use it or not.  And that's not just storage on disk, each time a 
> row is fetched, the entire row will be pulled into memory.
> 
> Also, storing binary data in SQL databases (Postres at least) can be a real 
> nightmare to deal with.  Everything in HBase is byte[] so you can do whatever 
> you want.  An aside, but important to some nonetheless.
> 
> Another is the ability to Map-Reduce.  I have to admit I know nothing about 
> MR and relational databases but it makes little sense to me in a 
> non-distributed file system.  I have actually heard of someone attempting to 
> run MySQL on top of HDFS to get distribution, fault-tolerance, and MR.  Will 
> have to do some more searching.
> 
> 
> Guess that's my rant for the week.  Hope that provides more clarity than 
> confusion.
> 
> 
> Jonathan Gray
> 
> 
>> Jonathan Gray schrieb:
>>> A few very big differences...
>>>
>>> - HBase/BigTable don't have "transactions" in the same way that a
>>> relational database does.  While it is possible (and was just recently
>>> implemented for HBase, see HBASE-669) it is not at the core of this
>>> design.  A major bottleneck of distributed multi-master relational
>>> databases is distributed transactions/locks.
>>>
>>> - There's a very big difference between storage of
>>> relational/row-oriented databases and column-oriented databases.  For
>>> example, if I have a table of 'users' and I need to store friendships
>>> between these users... In a relational database my design is something
>>> like:
>>>
>>> Table: users(pkey = userid)
>>> Table: friendships(userid,friendid,...) which contains one (or maybe
>>> two depending on how it's impelemented) row for each friendship.
>>>
>>> In order to lookup a given users friend, SELECT * FROM friendships
>>> WHERE userid = 'myid';
>>>
>>> This query would use an index on the friendships table to retrieve all
>>> the necessary rows.  Depending on the relational database you might
>>> also be fetching each and every row (entirely) off of disk to be
>>> read.  In a sharded relational database, this would require hitting
>>> every node to get whichever friendships were stored on that node. 
>>> There's lots of room for optimizations here but any way you slice it,
>>> you're likely pulling non-sequential blocks off disk.  When you add in
>>> the overhead of ACID transactions this can get slow.
>>>
>>> The cost of this relational query continues to increase as a user adds
>>> more friends.  You also begin to have practical limits.  If I have
>>> millions of users, each with many thousands of potential friends, the
>>> size of these indexes grow exponentially and things get nasty
>>> quickly.  Rather than friendships, imagine I'm storing activity logs
>>> of actions taken by users.
>>>
>>> In a column-oriented database these things scale continuously with
>>> minimal difference between 10 users and 10,000,000 users, 10
>>> friendships and 10,000 friendships.
>>>
>>> Rather than a friendships table, you could just have a friendships
>>> column family in the users table.  Each column in that family would
>>> contain the ID of a friend.  The value could store anything else you
>>> would have stored in the friendships table in the relational model. 
>>> As column families are stored together/sequentially on a per-row
>>> basis, reading a user with 1 friend versus a user with 10,000 friends
>>> is virtually the same.  The biggest difference is just in the shipping
>>> of this information across the network which is unavoidable.  In this
>>> system a user could have 10,000,000 friends.  In a relational database
>>> the size of the friendship table would grow massively and the indexes
>>> would be out of control.
>>>
>>>
>>> It's certainly possible to make relational databases "scale".  What
>>> that is about is usually massive optimizations, manual sharding, being
>>> very clever about how you query things, and often de-normalizing. 
>>> Index bloat and table bloat can thrash a relational db.
>>>
>>> In HBase, de-normalizing is usually a good thing.  Storage space is
>>> often considered free (not a large cost associated with storing
>>> something in multiple places).  In a relational database this is not
>>> so much the case.
>>>
>>> In HBase, the sharding and distribution come for free out of the box. 
>>> With a relational database you must jump through hoops and often end
>>> up implementing your own sharding/distributed hashing algorithms so
>>> you can distribute across machines.
>>>
>>> Yes, you lose the relational primitives.  Sometimes you wish you could
>>> do a simple join.  But if you get in the right mindset, you learn how
>>> to put your data into this new data model.  And in the end the payoffs
>>> are huge.
>>>
>>>
>>> In addition to all this, with HBase/Hadoop you get MapReduce.  It's
>>> feasible to implement something like that on top of a distributed
>>> relational database but again the complexity is enormous.  With
>>> HBase/Hadoop it's a built-in part of the system, a system which is
>>> very intelligent at keeping logic close to data, etc.
>>>
>>>
>>> To directly answer your question, it's "simpler" to scale HBase
>>> because the scaling comes for free out of the box.  You get automatic
>>> sharding/distribution of storage and queries.  There's nothing simpler
>>> than that.  Distributing a relational database is never simple.
>>>
>>> I hope this starts to shed some light on what the differences are.
>>>
>>>
>>> Jonathan Gray
>>> -----Original Message-----
>>> From: Mork0075 [mailto:[EMAIL PROTECTED] Sent: Thursday, August
>>> 21, 2008 8:48 AM
>>> To: core-user@hadoop.apache.org; [EMAIL PROTECTED]
>>> Subject: Re: Why is scaling HBase much simpler then scaling a
>>> relational db?
>>>
>>> Thank you, but i still don't got it.
>>>
>>> I've read tons of websites and papers, but there's no clear und
>>> founded answer "why use BigTable instead of relational databases".
>>>
>>> MySQL Cluster seams to offer the same scalabilty and level of
>>> abstraction, whithout switching to a non relational pardigm. Lots of
>>> blog posts are highly emotional, without answering the core question:
>>>
>>> "Why RDBMS don't scale and why something like BigTable do". Often you
>>> read something like this:
>>>
>>> "They have also built a system called BigTable, which is a Column
>>> Oriented Database, which splits a table into columns rather than rows
>>> making is much simpler to distribute and parallelize."
>>>
>>> Why?
>>>
>>> Really confusing ... ;)
>>>
>>> Stuart Sierra schrieb:
>>>> On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> Can you please explain, why someone should use HBase for horizontal
>>>>> scaling instead of a relational database? One reason for me would be,
>>>>> that i don't have to implement the sharding logic myself. Are there
>>>>> other?
>>>> A slight tangent -- there are various tools that implement sharding
>>>> over relational databases like MySQL.  Two that I know of are
>>>> DBSlayer,
>>>> http://code.nytimes.com/projects/dbslayer
>>>> and MySQL Proxy,
>>>> http://forge.mysql.com/wiki/MySQL_Proxy
>>>>
>>>> I don't know of any formal comparisons between sharding traditional
>>>> database servers and distributed databases like HBase.
>>>> -Stuart
>>>>
>>>
>>
> 
>

Re: Why is scaling HBase much simpler then scaling a relational db?

Reply via email to