Re: Using lucene as a database... good idea or bad idea?

Grant Ingersoll Tue, 29 Jul 2008 07:26:34 -0700

Don't connect "database" (i.e. SQL, transactions, etc.) and Lucene.Connect data storage with simple, fast lookup and Lucene.

One field is the key (i.e. the filename) the other field is a binary,stored Field containing the contents of the file. Of course, thereare other ways of slicing and dicing, such that one can search (in thefuzzy sense) the content and the key by adding tokenization, etc.This is the more traditional model for Lucene

Also, have a look at Apache Jackrabbit. It is a content repositorythat is implemented with Lucene.


-Grant

On Jul 29, 2008, at 10:02 AM, ನಾಗೇಶ್ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) wrote:

Hi Ian,
Yes, I see that we are discussing an "option" here.
But, as I said before (the three parts to search-based solution), Ido notknow (but, would like to know) how Lucene (java only - not Nutch,Solr,
etc.) can be used as a datastore.

Basically, I am not able to connect "database" and Lucene java. :)

Nagesh


On Tue, Jul 29, 2008 at 6:51 PM, Ian Lea <[EMAIL PROTECTED]> wrote:
I don't think that anyone in this thread has said "should", just
"could" - it is a valid option (IMHO).  Personally, I use it as a
store for lucene related data because I know and like and trust it,it
is already there for this project so no need to introduce another
software dependency, and because it is blindingly fast.


--
Ian.
On Tue, Jul 29, 2008 at 1:43 PM, ನಾಗೇಶ್ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
<[EMAIL PROTECTED]> wrote:
The way I see it, search solutions (on whatever scale) have three
components
- data aggregation, indexing/searching and presentation ofresults. I
thought, Lucene did the second part only.

So, I do not quite follow, why should Lucene be used for datastore ?

Nagesh
On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll<[EMAIL PROTECTED]
wrote:
I think the answer is it can be done and probably quite well. Ialso
think
it's informative that Nutch does not use Lucene for thisfunction, as Iunderstand it, but that shouldn't stop you either. You mightalso have
a
look at Apache Jackrabbit, which uses Lucene underneath as acontent
repository.

-Grant


On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote:

Hello all,
I am also interested in this. I want to archive the content of the
document using Lucene.

Is it a good idea to use Lucene as storage engine?

Regards
Ganesh

----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, July 29, 2008 2:18 PM
Subject: Re: Using lucene as a database... good idea or bad idea?


John
I think it's a great idea, and do exactly this to store 5million+
documents with info that it takes way too long to get out of our
Oracle database (think days). Not as many docs as you aretalkingabout, and less data for each doc, but I wouldn't have anyconcernsabout scaling. There are certainly lucene indexes out therebiggerthan what you propose. You can compress the stored data tosave some
space.  Run times for optimization might get interesting but see
recent threads for suggestions on that. And since you are nottooconcerned about performance you may not need to optimize much,or even
at all.
Of course you need to remember that this is not a DBMS solutionin thesense of transactions, recovery, etc. but I'm sure you arealready
aware of that.


--
Ian.
On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]>wrote:
Hi All,
I have successfully used Lucene in the "tradtiional" way toprovide
full-text search for various websites.  Now I am tasked with
developing
a
data-store to back a web crawler. The crawler can beconfigured to
retrieve
arbitrary fields from arbitrary pages, so the result is thateach
document
may have a random assortment of fields. It seems like Lucenemay be
a
natural fit for this scenario since you can obviously addarbitrary
fields
to each document and you can store the actually data in thedatabase.
I've
done some research to make sure that it would meet all of our
individual
requirements (that we can iterate over documents, update
(delete/replace)
documents, etc.) and everything looks good. I've also seen acouple
of
references around the net to other people trying similarthings...
however,
I know it's not meant to be used this way, so I thought Iwould post
here
and ask for guidance? Has anyone done something similar? Isthere
any
specific reason to think this is a bad idea?

The one thing that I am least certain about his how well it will
scale.
We
may reach the point where we have tens of millions ofdocuments and a
high
percentage of those documents may be relatively large (10k-50keach).
We
actually would NOT be expecting/needing Lucene's normalextreme fast
text
search times for this, but we would need reasonable times foradding
new
documents to the index, retrieving documents by ID (foriterating
over
all
documents), optimizing the index after a series of changes, etc.

Any advice/input/theories anyone can contribute would be greatly
appreciated.

Thanks,
-
John
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Send instant messages to your online friends
http://in.messenger.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using lucene as a database... good idea or bad idea?

Reply via email to