Re: Using lucene as a database... good idea or bad idea?

Grant Ingersoll Tue, 29 Jul 2008 05:32:35 -0700

I think the answer is it can be done and probably quite well. I alsothink it's informative that Nutch does not use Lucene for thisfunction, as I understand it, but that shouldn't stop you either. Youmight also have a look at Apache Jackrabbit, which uses Luceneunderneath as a content repository.


-Grant


On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote:

Hello all,
I am also interested in this. I want to archive the content of thedocument using Lucene.
Is it a good idea to use Lucene as storage engine?

Regards
Ganesh

----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, July 29, 2008 2:18 PM
Subject: Re: Using lucene as a database... good idea or bad idea?
John


I think it's a great idea, and do exactly this to store 5 million+
documents with info that it takes way too long to get out of our
Oracle database (think days).  Not as many docs as you are talking
about, and less data for each doc, but I wouldn't have any concerns
about scaling.  There are certainly lucene indexes out there bigger
than what you propose.  You can compress the stored data to save some
space.  Run times for optimization might get interesting but see
recent threads for suggestions on that.  And since you are not too
concerned about performance you may not need to optimize much, oreven
at all.
Of course you need to remember that this is not a DBMS solution inthe
sense of transactions, recovery, etc. but I'm sure you are already
aware of that.


--
Ian.


On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> wrote:
Hi All,

I have successfully used Lucene in the "tradtiional" way to provide
full-text search for various websites. Now I am tasked withdeveloping adata-store to back a web crawler. The crawler can be configuredto retrievearbitrary fields from arbitrary pages, so the result is that eachdocumentmay have a random assortment of fields. It seems like Lucene maybe anatural fit for this scenario since you can obviously addarbitrary fieldsto each document and you can store the actually data in thedatabase. I'vedone some research to make sure that it would meet all of ourindividualrequirements (that we can iterate over documents, update (delete/replace)documents, etc.) and everything looks good. I've also seen acouple ofreferences around the net to other people trying similar things...however,I know it's not meant to be used this way, so I thought I wouldpost hereand ask for guidance? Has anyone done something similar? Isthere any
specific reason to think this is a bad idea?
The one thing that I am least certain about his how well it willscale. Wemay reach the point where we have tens of millions of documentsand a highpercentage of those documents may be relatively large (10k-50keach). Weactually would NOT be expecting/needing Lucene's normal extremefast textsearch times for this, but we would need reasonable times foradding newdocuments to the index, retrieving documents by ID (for iteratingover all
documents), optimizing the index after a series of changes, etc.

Any advice/input/theories anyone can contribute would be greatly
appreciated.

Thanks,
-
John
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Send instant messages to your online friends http://in.messenger.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using lucene as a database... good idea or bad idea?

Reply via email to