The way I see it, search solutions (on whatever scale) have three components - data aggregation, indexing/searching and presentation of results. I thought, Lucene did the second part only.
So, I do not quite follow, why should Lucene be used for datastore ? Nagesh On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > I think the answer is it can be done and probably quite well. I also think > it's informative that Nutch does not use Lucene for this function, as I > understand it, but that shouldn't stop you either. You might also have a > look at Apache Jackrabbit, which uses Lucene underneath as a content > repository. > > -Grant > > > On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote: > > Hello all, >> >> I am also interested in this. I want to archive the content of the >> document using Lucene. >> >> Is it a good idea to use Lucene as storage engine? >> >> Regards >> Ganesh >> >> ----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]> >> To: <java-user@lucene.apache.org> >> Sent: Tuesday, July 29, 2008 2:18 PM >> Subject: Re: Using lucene as a database... good idea or bad idea? >> >> >> John >>> >>> >>> I think it's a great idea, and do exactly this to store 5 million+ >>> documents with info that it takes way too long to get out of our >>> Oracle database (think days). Not as many docs as you are talking >>> about, and less data for each doc, but I wouldn't have any concerns >>> about scaling. There are certainly lucene indexes out there bigger >>> than what you propose. You can compress the stored data to save some >>> space. Run times for optimization might get interesting but see >>> recent threads for suggestions on that. And since you are not too >>> concerned about performance you may not need to optimize much, or even >>> at all. >>> >>> Of course you need to remember that this is not a DBMS solution in the >>> sense of transactions, recovery, etc. but I'm sure you are already >>> aware of that. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> wrote: >>> >>>> Hi All, >>>> >>>> I have successfully used Lucene in the "tradtiional" way to provide >>>> full-text search for various websites. Now I am tasked with developing >>>> a >>>> data-store to back a web crawler. The crawler can be configured to >>>> retrieve >>>> arbitrary fields from arbitrary pages, so the result is that each >>>> document >>>> may have a random assortment of fields. It seems like Lucene may be a >>>> natural fit for this scenario since you can obviously add arbitrary >>>> fields >>>> to each document and you can store the actually data in the database. >>>> I've >>>> done some research to make sure that it would meet all of our individual >>>> requirements (that we can iterate over documents, update >>>> (delete/replace) >>>> documents, etc.) and everything looks good. I've also seen a couple of >>>> references around the net to other people trying similar things... >>>> however, >>>> I know it's not meant to be used this way, so I thought I would post >>>> here >>>> and ask for guidance? Has anyone done something similar? Is there any >>>> specific reason to think this is a bad idea? >>>> >>>> The one thing that I am least certain about his how well it will scale. >>>> We >>>> may reach the point where we have tens of millions of documents and a >>>> high >>>> percentage of those documents may be relatively large (10k-50k each). >>>> We >>>> actually would NOT be expecting/needing Lucene's normal extreme fast >>>> text >>>> search times for this, but we would need reasonable times for adding new >>>> documents to the index, retrieving documents by ID (for iterating over >>>> all >>>> documents), optimizing the index after a series of changes, etc. >>>> >>>> Any advice/input/theories anyone can contribute would be greatly >>>> appreciated. >>>> >>>> Thanks, >>>> - >>>> John >>>> >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> >> Send instant messages to your online friends >> http://in.messenger.yahoo.com >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >