Re: Question about how queries are distributed

Jean-Daniel Cryans Mon, 04 Aug 2008 08:41:26 -0700

@Leon, I never tried but I guess this would be the right thing to do, maybe
Bryan got something to say on this since he built it. A fun thing to do
would actually be to test how much does it really scale but I think it's a
lot since Thrift is very lightweight.


@Zhao, starting the Thrift server on another machine works since it's just a
wrapper for the Java client API. You would have to redirect the Thrift
clients tho, and Leon's solution would do that.

J-D

On Mon, Aug 4, 2008 at 10:39 AM, ZhaoWei <[EMAIL PROTECTED]> wrote:

> I think it is a problem if I understood the question well. For thrift
> clients,
> the only hbase client is the thrift server. If thrift clients can not
> communicate
> with hbase region servers directly, starting thrift server on another
> machine just
> help little, right?
>
> From: "Jean-Daniel Cryans" <[EMAIL PROTECTED]>
> > Ah ok I see what you meant! Yes, the Thrift client communicates with a
> > Thrift server which is bundled with the Master, so the HBase client code
> > doesn't run on your local machine that queries HBase. So yes, there may
> be a
> > scalability problem if many many clients queries at the same time. I
> don't
> > personnaly use Thrift a lot but it seems to me that if someone uses it in
> a
> > production environement with a big load, he/she should definitively start
> > the Thrift server on another machine (the same way the Master should not
> be
> > with the Hadoop Namenode).
> >
> > Thank you Leon for asking the question, I'm sure others may have learned
> > something.
> >
> > J-D
> >
> > On Mon, Aug 4, 2008 at 6:25 AM, Leon Mergen <[EMAIL PROTECTED]> wrote:
> >
> > > Hello Jean-Daniel,
> > >
> > > Ok, thank you for your response. I was worried that maybe because when
> > > using
> > > Thrift, the client would have to do any communications with a Hbase
> > > regionserver through the master server -- while I still don't quite
> > > understand how it's solved with Thrift, as I understand it, the Thrift
> > > client code (as in, the code that I embed in my application) will not
> query
> > > the master server "after it learns the location of the ROOT HRegion",
> and
> > > from then will talk directly to the RegionServers, since the Thrift API
> > > actually fully implements the regular Java HBase client, even when
> working
> > > from a language such as C++ ?
> > >
> > > I always thought Thrift was a simple way to serialize/unserialize data
> in
> > > an
> > > efficient and platform independent manner, but sounds like it's more
> > > advanced, which is good. :-)
> > >
> > > Regards,
> > >
> > > Leon Mergen
> > >
> > >
> > > On Mon, Aug 4, 2008 at 3:56 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Leon,
> > > >
> > > > The HBase Architecture page in the wiki does give this kind of
> > > information,
> > > > specifically here:
> > > > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#metadata and
> since
> > > > HBase is a Bigtable clone, reading it's paper also gives useful
> > > > information:
> > > > http://labs.google.com/papers/bigtable.html
> > > >
> > > > To make it short, the client queries the .META. table to find the
> users
> > > > tables regions to which it puts and gets data. Thrift only acts a as
> > > > decorator on the Java HBase client.
> > > >
> > > > Until Zookeeper is integrated in HBase (like Chubby for Bigtable),
> the
> > > > Master is a SPOF but should not have any scalability-related problem.
> > > >
> > > > Hope this helps,
> > > >
> > > > J-D
> > > >
> > > > On Sun, Aug 3, 2008 at 7:22 PM, Leon Mergen <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'm looking for some information on hbase's architecture (out of
> pure
> > > > > interest), which i wasn't able to find anything about it on the
> Hbase
> > > > site
> > > > > (including the architecture description).
> > > > >
> > > > > Specifically, I am curious how writes/mutations are distributed
> amongst
> > > > the
> > > > > servers, and whether this is different when using an interface like
> > > > Thrift.
> > > > > Is a server located for each mutateRow () operations "asked for" at
> the
> > > > > master server, or is that cached at some level ? If not, how is
> that
> > > > > problem
> > > > > solved that a client only connects to the master server but
> actually
> > > > needs
> > > > > to talk to one of the slave servers ? Or is the master server a
> single
> > > > weak
> > > > > spot that could introduce scalability problems on large (huge)
> scale ?
> > > > >
> > > > > Thanks in advance for any responses!
> > > > >
> > > > > Regards,
> > > > >
> > > > > Leon Mergen
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Leon Mergen
> > > http://www.solatis.com
> > >
>

Re: Question about how queries are distributed

Reply via email to