On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric <ean...@business.com> wrote:
> > Does anyone have any recommendations? I've looked at Katta, but it doesn't > seem to support realtime searching. It also uses hdfs, which I've heard can > be slow. I'm looking to serve 40gb of indexes and support about 1 million > updates per day. > > Hi Eric, As I mentioned in my response to Jason, we at LinkedIn serve our roughly 50million document profile index on a real-time distributed setup (we're serving facets in real-time also), serving tens of millions of queries a day in the 1-10ms latency per node, based on the open source zoie project (built here at LinkedIn) : http://zoie.googlecode.com Zoie doesn't handle the distributed part of the setup, it's just the real-time side. Distribution is done pretty straitgtforwardly in our case though: N shards each getting a different contiguous slice of the user base, each replicated K times, and all N*K nodes get indexing events distributed by a message queue independently. If you have any questions about zoie, let me know. The documentation could get filled in a little further, and it doesn't touch on distributed side of things, so feel free to ping me. -jake