On Jun 9, 2006, at 2:10 AM, Chris Hostetter wrote:
: 2. Recreating the index from scratch will require the moving of the
: heavens and the earth.
:
: My crazy idea - can we add new Documents to the index with the
Fields
: we wish to add, and duplicate file IDs? i.e. an entry for file ID
Foo
: would consist of two Documents,
: Document X: fileID:<Foo>, contents:<unknown>
: Document Y:fileID:<Foo>, title:<Bar>, url:<www.baz.com>, etc.
:
: It would be no problem to implement different Searcher objects to
: look at specific Fields, we were already leaning in that direction
: anyhow.
you certainly could do that .. but what exactly would the point be? ..
presumably you currently query for "contents:germany" and get back the
fileIDs of files that contain the work germany in their contents --
if you
add another document with the same fileID and a title field and a url
field, and you search for "contents:germany" you're still going to get
back the same document -- it's not going to magically have the other
fields in it just because they have the same fileID.
That kinda would be the point - "contents:germany" would get the same
fileIDs, but "contents:germany title:medicine" would (hopefully) give
us a more specific query.
I supose you could do the search on contents, get back the fileIDs and
*then* do another search for those fileIDs to get back the titles
and urls
... but i can't imagine earth and the heavens are that hard to move
that
you'd want to jump through that hoop on every search.
Good point. Perhaps the better idea would be to build a separate
index with the fields to be added, and create a MultiSearcher to
operate over both indices.
(if you're goingto add these new documents with the title and url
and all
that -- why can't you add the contents atthe same time ... are the
contents stored someplace else that you no longer have access to -
but you
do have access to all the other fields???)
That is indeed the case. We have a BerkeleyDB with titles, URLs, that
sort of thing for an on-disk precache, but a) the code written to
actually generate the Lucene index is terrible, b) the resources used
to generate the index are scattered at best, missing at worst, and c)
the person who wrote the code isn't available any more. I was hoping
to find some Brilliant Plan to get this done quickly (we're demoing
for the World Health Organization sometime this week).
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]