Re: Distributed Lucene.. - clustering as a requirement

Chris Lamprecht Thu, 06 Apr 2006 12:55:23 -0700

What about using lucene just for searching (i.e., no stored fields
except maybe one "ID" primary key field), and using an RDBMS for
storing the actual "documents"?  This way you're using lucene for what
lucene is best at, and using the database for what it's good at.  At
least up to a point -- RDBMSs have their limits too.  OR maybe if you
have a huge dataset, you might want to check out Nutch.


On 4/6/06, Dmitry Goldenberg <[EMAIL PROTECTED]> wrote:
> I firmly believe that clustering support should be a part of Lucene.  We've 
> tried implementing it ourselves and so far have been unsuccessful.  We tried 
> storing Lucene indices in a database that is the back-end repository for our 
> app in a clustered environment and could not overcome the indexing exceptions 
> in our custom Directory implementation.
>
> I think it'd be perfect if some of the Lucene gurus were to implement an 
> RDBMS-backed Directory and post it (in addition to the sleepycat.db package 
> that's currently in contrib).  The nitty-gritties of dealing with Lucene 
> indexing structures at a single byte level are just way too much trouble to 
> deal with for application integrator like myself.
>
> - Dmitry
>
> ________________________________
>
> From: Samuru Jackson [mailto:[EMAIL PROTECTED]
> Sent: Mon 3/6/2006 10:05 AM
> To: [email protected]
> Subject: Re: Distributed Lucene..
>
>
>
> Do you plan to release some kind of a commerical product including an API?
>
> I ask because I'm evaluating different technologies for a prototype
> which is part of my diploma thesis.
>
> The problem is that I have to deal with real huge data amounts and one
> machine is simply not enough to handle those amounts of data.
>
> Lucene seems to be a good choice but it won't scale up for real big
> data amounts. So I thought about expanding the indexes over several
> machines in junks so that it fits into the memory of those machines.
>
> One machine should collect the results and calculate some kind of
> score out of the delivered hits from the machines.
>
> As I'm not familiar with the concrete mechanisms of Lucene this is
> just a naive thought, but I think that such a clustering mechanism
> could become a killer app.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Distributed Lucene.. - clustering as a requirement

Reply via email to