On Jun 9, 2006, at 2:10 AM, Chris Hostetter wrote:


: 2. Recreating the index from scratch will require the moving of the
: heavens and the earth.
:
: My crazy idea - can we add new Documents to the index with the Fields : we wish to add, and duplicate file IDs? i.e. an entry for file ID Foo
: would consist of two Documents,
: Document X: fileID:<Foo>, contents:<unknown>
: Document Y:fileID:<Foo>, title:<Bar>, url:<www.baz.com>, etc.
:
: It would be no problem to implement different Searcher objects to
: look at specific Fields, we were already leaning in that direction
: anyhow.

you certainly could do that .. but what exactly would the point be? ..
presumably you currently query for "contents:germany" and get back the
fileIDs of files that contain the work germany in their contents -- if you
add another document with the same fileID and a title field and a url
field, and you search for "contents:germany" you're still going to get
back the same document -- it's not going to magically have the other
fields in it just because they have the same fileID.
That kinda would be the point - "contents:germany" would get the same fileIDs, but "contents:germany title:medicine" would (hopefully) give us a more specific query.

I supose you could do the search on contents, get back the fileIDs and
*then* do another search for those fileIDs to get back the titles and urls ... but i can't imagine earth and the heavens are that hard to move that
you'd want to jump through that hoop on every search.
Good point. Perhaps the better idea would be to build a separate index with the fields to be added, and create a MultiSearcher to operate over both indices.

(if you're goingto add these new documents with the title and url and all
that -- why can't you add the contents atthe same time ... are the
contents stored someplace else that you no longer have access to - but you
do have access to all the other fields???)
That is indeed the case. We have a BerkeleyDB with titles, URLs, that sort of thing for an on-disk precache, but a) the code written to actually generate the Lucene index is terrible, b) the resources used to generate the index are scattered at best, missing at worst, and c) the person who wrote the code isn't available any more. I was hoping to find some Brilliant Plan to get this done quickly (we're demoing for the World Health Organization sometime this week).


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to