Re: hbase architecture diagrams

stack Tue, 10 Jun 2008 10:11:45 -0700

That is some of the finest art seen by me in a long time. We're locatedclose to MoMA. I'm going to see if we can get you an installation.


Answers inline.


Krzysztof Szlapinski wrote:

hi all,
to better understand how hbase works i started reading this documenthttp://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
and created some diagrams

here they are (png, and svg for editing):

1) habase hierarchy of objects:
http://www.starline.com.pl/hbase/habase_hierarchy.png
http://www.starline.com.pl/hbase/habase_hierarchy.svg

I'd suggest that Master, Client and RegionServer be peers rather thanarranged hierarchically. The client talks but rarely tot he master onlyto ask it where the catalog tables are located. Thereafter it talksexclusively with the regionserers. Have arrows going from the cilent toboth the master and the regionserver.

2) hbase architecture (relations between objects)
http://www.starline.com.pl/hbase/habase_architecture.png
http://www.starline.com.pl/hbase/habase_architecture.svg

Same as comment above.

3) visual representation flush cache operation
http://www.starline.com.pl/hbase/hbase_flush_cache.png
http://www.starline.com.pl/hbase/hbase_flush_cache.svg

Here, flushes are done from the memcache. The diagram doesn't give thisimpression.

since the documentation says that its information may be out of dateplease feel free to comment on these diagrams, update them, put themon your sites etc
i got a question too
lets say we have cluster of 3 machines:
- 1 master + region server,and
- 2 region servers
on each machine I got web server that connects to hbase client to getand get information out from hbase
it is not clear to me where should these clients connect to
should all clients connect directly and only to the master, which willtell them on which region server is the information they are looking for?or can they connect to the region servers and if the information theyare looking for in not in them region servers will contact master andfetch there information for the client?

You almost have it.

A client that wants to insert row X into table A needs to figure whichregion of table A the row X belongs too. This information is kept inthe .META. table. It is a listing of all regions for all tables keyedby table and the first row in a region sorted lexicographically. Theregions that make up the .META. table table are themselves kept in aspecial catalog table, the -ROOT- table.

A fresh client -- one that has just started and so has an empty cache --goes first to the master to ask it where the root region is hosted.Once it has the address of the regionserver hosting the root region, itcaches it, and then it goes to the hosting regionserver to read thelocation of the .META. table region that has the row that contains theregion of table A into which X should be inserted. The client goes tothe .META. region hosting server after caching its location and readslocation of the region from table A where it should insert X.


Finally it goes to server hosting table A's region and inserts X.

Over time, cilent builds up a cache of where regions are located andwill rely on this information rather than travel the net to readlocations every time it needs to find a region -- until there is afault. At that time, it will back up the hierarchy of region locationsto fix its list of locations and then away it goes again.

Check out the Bigtable paper. It does better explaination than I of howthis all works.


St.Ack

krzysiek

Re: hbase architecture diagrams

Reply via email to