Re: Question about how queries are distributed

Leon Mergen Mon, 04 Aug 2008 03:26:15 -0700

Hello Jean-Daniel,

Ok, thank you for your response. I was worried that maybe because when using
Thrift, the client would have to do any communications with a Hbase
regionserver through the master server -- while I still don't quite
understand how it's solved with Thrift, as I understand it, the Thrift
client code (as in, the code that I embed in my application) will not query
the master server "after it learns the location of the ROOT HRegion", and
from then will talk directly to the RegionServers, since the Thrift API
actually fully implements the regular Java HBase client, even when working
from a language such as C++ ?


I always thought Thrift was a simple way to serialize/unserialize data in an
efficient and platform independent manner, but sounds like it's more
advanced, which is good. :-)

Regards,

Leon Mergen


On Mon, Aug 4, 2008 at 3:56 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> Leon,
>
> The HBase Architecture page in the wiki does give this kind of information,
> specifically here:
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#metadata and since
> HBase is a Bigtable clone, reading it's paper also gives useful
> information:
> http://labs.google.com/papers/bigtable.html
>
> To make it short, the client queries the .META. table to find the users
> tables regions to which it puts and gets data. Thrift only acts a as
> decorator on the Java HBase client.
>
> Until Zookeeper is integrated in HBase (like Chubby for Bigtable), the
> Master is a SPOF but should not have any scalability-related problem.
>
> Hope this helps,
>
> J-D
>
> On Sun, Aug 3, 2008 at 7:22 PM, Leon Mergen <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > I'm looking for some information on hbase's architecture (out of pure
> > interest), which i wasn't able to find anything about it on the Hbase
> site
> > (including the architecture description).
> >
> > Specifically, I am curious how writes/mutations are distributed amongst
> the
> > servers, and whether this is different when using an interface like
> Thrift.
> > Is a server located for each mutateRow () operations "asked for" at the
> > master server, or is that cached at some level ? If not, how is that
> > problem
> > solved that a client only connects to the master server but actually
> needs
> > to talk to one of the slave servers ? Or is the master server a single
> weak
> > spot that could introduce scalability problems on large (huge) scale ?
> >
> > Thanks in advance for any responses!
> >
> > Regards,
> >
> > Leon Mergen
> >
>



-- 
Leon Mergen
http://www.solatis.com

Re: Question about how queries are distributed

Reply via email to