Comments inline:
rolaren...@earthlink.net wrote:
R2.4
I have been looking through the soon-to-be-superseded (by its 2nd ed.) book "Lucene In Action" (hope it's ok on this newsgroup to say I like that book); also at these two tutorials: http://darksleep.com/lucene/ and http://www.informit.com/articles/article.aspx?p=461633&seqNum=3 and also at the Lucene online docco (http://lucene.apache.org/java/2_4_0/index.html) the last of which has nothing on the topic at all! I've also tried to search http://www.nabble.com/Lucene---Java-Users-f45.html -- but there are almost 10,000 docs there on "Field." so that is too much data.
The book is consistent with the two tutorials, but all three seem to be out of date (and the design less clear) compared to the code: http://lucene.apache.org/java/2_4_0/api/index.html
I have copied some code and it is working for me, but I am a little uncertain how to decide what value of Field.Index and Field.Store to choose in order to get the behavior I'd like. If I read the javadocs, and decide to ignore all the "expert" items, it looks like this:
Field.Store.NO = I'll never see that data again; I wonder why I'd do this?
This is useful in the cases where you have data you want to be able to
search by, but never need to display it.
For example in my application we have complex data like:
kit<Gsfco1>
^
<http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=alleleDetail&id=MGI:3530308>
In one of our searchable indexes we do quite a bit of transformation to
this data, and remove all of the punctuation, etc etc.
so it turns into: kit gsfcol
This is great for searching, cause it allows us to have punctuation
irrelevant search results, but the user simply doesn't care whatsoever.
So at display time, we show them the unmodified, case sensitive version
of this data, which is stored in another field.
Field.Store.YES = good, the data will be stored
Storage takes up space, so if you are ONLY going to search on a piece of
data, and never display it, you should not store it.
Field.Store.COMPRESS = even better, stored and compressed; why would anyone do anything else?
I agree.
========
Field.Index.NO = I cannot search that data, but if I need its value for a given document (e.g., to decorate a result), I can retrieve it (use-case: maybe, the date the document was created -- but why not just make that searchable? I am having a hard time thinking of an actually useful piece of data that could go here and would not want to be one of ANALYZED or NOT_ANALYZED)
Correct, you use this type of data as additional information about the
data you matched on.
Field.Index.ANALYZED = the normal value, I would guess, except in the special
case of stuff not searchable but used to decorate results (Field.Index.NO)
Correct.
Field.Index.NOT_ANALYZED = I can search for this value, but it won't get analyzed, so it is searched for as the very same value I put in (the docco suggests product numbers: any other interesting use-cases anyone can suggest?)
Its highly useful for exact match searching.
=========
thanks in advance for helping me get clearer on this!
-Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org