@Leon, I never tried but I guess this would be the right thing to do, maybe Bryan got something to say on this since he built it. A fun thing to do would actually be to test how much does it really scale but I think it's a lot since Thrift is very lightweight.
@Zhao, starting the Thrift server on another machine works since it's just a wrapper for the Java client API. You would have to redirect the Thrift clients tho, and Leon's solution would do that. J-D On Mon, Aug 4, 2008 at 10:39 AM, ZhaoWei <[EMAIL PROTECTED]> wrote: > I think it is a problem if I understood the question well. For thrift > clients, > the only hbase client is the thrift server. If thrift clients can not > communicate > with hbase region servers directly, starting thrift server on another > machine just > help little, right? > > From: "Jean-Daniel Cryans" <[EMAIL PROTECTED]> > > Ah ok I see what you meant! Yes, the Thrift client communicates with a > > Thrift server which is bundled with the Master, so the HBase client code > > doesn't run on your local machine that queries HBase. So yes, there may > be a > > scalability problem if many many clients queries at the same time. I > don't > > personnaly use Thrift a lot but it seems to me that if someone uses it in > a > > production environement with a big load, he/she should definitively start > > the Thrift server on another machine (the same way the Master should not > be > > with the Hadoop Namenode). > > > > Thank you Leon for asking the question, I'm sure others may have learned > > something. > > > > J-D > > > > On Mon, Aug 4, 2008 at 6:25 AM, Leon Mergen <[EMAIL PROTECTED]> wrote: > > > > > Hello Jean-Daniel, > > > > > > Ok, thank you for your response. I was worried that maybe because when > > > using > > > Thrift, the client would have to do any communications with a Hbase > > > regionserver through the master server -- while I still don't quite > > > understand how it's solved with Thrift, as I understand it, the Thrift > > > client code (as in, the code that I embed in my application) will not > query > > > the master server "after it learns the location of the ROOT HRegion", > and > > > from then will talk directly to the RegionServers, since the Thrift API > > > actually fully implements the regular Java HBase client, even when > working > > > from a language such as C++ ? > > > > > > I always thought Thrift was a simple way to serialize/unserialize data > in > > > an > > > efficient and platform independent manner, but sounds like it's more > > > advanced, which is good. :-) > > > > > > Regards, > > > > > > Leon Mergen > > > > > > > > > On Mon, Aug 4, 2008 at 3:56 AM, Jean-Daniel Cryans <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Leon, > > > > > > > > The HBase Architecture page in the wiki does give this kind of > > > information, > > > > specifically here: > > > > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#metadata and > since > > > > HBase is a Bigtable clone, reading it's paper also gives useful > > > > information: > > > > http://labs.google.com/papers/bigtable.html > > > > > > > > To make it short, the client queries the .META. table to find the > users > > > > tables regions to which it puts and gets data. Thrift only acts a as > > > > decorator on the Java HBase client. > > > > > > > > Until Zookeeper is integrated in HBase (like Chubby for Bigtable), > the > > > > Master is a SPOF but should not have any scalability-related problem. > > > > > > > > Hope this helps, > > > > > > > > J-D > > > > > > > > On Sun, Aug 3, 2008 at 7:22 PM, Leon Mergen <[EMAIL PROTECTED]> > wrote: > > > > > > > > > Hello, > > > > > > > > > > I'm looking for some information on hbase's architecture (out of > pure > > > > > interest), which i wasn't able to find anything about it on the > Hbase > > > > site > > > > > (including the architecture description). > > > > > > > > > > Specifically, I am curious how writes/mutations are distributed > amongst > > > > the > > > > > servers, and whether this is different when using an interface like > > > > Thrift. > > > > > Is a server located for each mutateRow () operations "asked for" at > the > > > > > master server, or is that cached at some level ? If not, how is > that > > > > > problem > > > > > solved that a client only connects to the master server but > actually > > > > needs > > > > > to talk to one of the slave servers ? Or is the master server a > single > > > > weak > > > > > spot that could introduce scalability problems on large (huge) > scale ? > > > > > > > > > > Thanks in advance for any responses! > > > > > > > > > > Regards, > > > > > > > > > > Leon Mergen > > > > > > > > > > > > > > > > > > > > > -- > > > Leon Mergen > > > http://www.solatis.com > > > >
