Re: indexing size

Bernhard Messer Thu, 09 Sep 2004 01:05:54 -0700

Dmitry Serebrennikov wrote:

Niraj Alok wrote:
Hi PA,
Thanks for the detail ! Since we are using lucene to store the data also, I guess I would not be able to use it.
By the way, I could be wrong, but I think the 35% figure you referenced in the your first e-mail actually does not include any stored fields. The deal with 35% was, I think, to illustrate that index data structures used for searching by Lucene are efficient. But Lucene does nothing special about stored content - no compression or anything like that. So you end up with the pure size of your data plus the 35% of the indexed data.

There will be a patch available to the end of this week, which allows you to store binary values compressed within a lucene index. It means that you will be able to store and retrieve whole documents within lucene in a very efficient way ;-)

regards
bernhard

Cheers.
Dmitry.

Regards,
Niraj
----- Original Message -----
From: "petite_abeille" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, September 01, 2004 1:14 PM
Subject: Re: indexing size

Hi Niraj,

On Sep 01, 2004, at 06:45, Niraj Alok wrote:

If I make some of them Field.Unstored, I can see from the javadocs that it will be indexed and tokenized but not stored. If it is not stored, how can I use it while searching?


The different type of fields don't impact how you do your search. This
is always the same.

Using Unstored fields simply means that you use Lucene as a pure index
for search purpose only, not for storing any data.

Specifically, the assumption is that your original data lives somewhere
else, outside of Lucene. If this assumption is true, then you can index
everything as Unstored with the addition of one Keyword per document.
The Keyword field holds some sort of unique identifier which allows you
to retrieve the original data if necessary (e.g. a primary key, an URI,
what not).

Here is an example of this approach:

(1) For indexing, check the indexValuesWithID() method

http://cvs.sourceforge.net/viewcvs.py/zoe/ZOE/Frameworks/SZObject/
SZIndex.java?view=markup

Note the addition of a Field.Keyword for each document and the use of
Field.UnStored for everything else

(2) For fetching, check objectsWithSpecificationAndHitsInStore()

http://cvs.sourceforge.net/viewcvs.py/zoe/ZOE/Frameworks/SZObject/
SZFinder.java?view=markup

HTH.

Cheers,

PA.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: indexing size

Reply via email to