Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

Bob Arens Fri, 09 Jun 2006 00:47:56 -0700


On Jun 9, 2006, at 2:10 AM, Chris Hostetter wrote:


: 2. Recreating the index from scratch will require the moving of the
: heavens and the earth.
:

: My crazy idea - can we add new Documents to the index with theFields: we wish to add, and duplicate file IDs? i.e. an entry for file IDFoo

: would consist of two Documents,
: Document X: fileID:<Foo>, contents:<unknown>
: Document Y:fileID:<Foo>, title:<Bar>, url:<www.baz.com>, etc.
:
: It would be no problem to implement different Searcher objects to
: look at specific Fields, we were already leaning in that direction
: anyhow.

you certainly could do that .. but what exactly would the point be? ..
presumably you currently query for "contents:germany" and get back the

fileIDs of files that contain the work germany in their contents --if you

add another document with the same fileID and a title field and a url
field, and you search for "contents:germany" you're still going to get
back the same document -- it's not going to magically have the other
fields in it just because they have the same fileID.

That kinda would be the point - "contents:germany" would get the samefileIDs, but "contents:germany title:medicine" would (hopefully) giveus a more specific query.

I supose you could do the search on contents, get back the fileIDs and
*then* do another search for those fileIDs to get back the titlesand urls... but i can't imagine earth and the heavens are that hard to movethat
you'd want to jump through that hoop on every search.

Good point. Perhaps the better idea would be to build a separateindex with the fields to be added, and create a MultiSearcher tooperate over both indices.

(if you're goingto add these new documents with the title and urland all
that -- why can't you add the contents atthe same time ... are the
contents stored someplace else that you no longer have access to -but you
do have access to all the other fields???)

That is indeed the case. We have a BerkeleyDB with titles, URLs, thatsort of thing for an on-disk precache, but a) the code written toactually generate the Lucene index is terrible, b) the resources usedto generate the index are scattered at best, missing at worst, and c)the person who wrote the code isn't available any more. I was hopingto find some Brilliant Plan to get this done quickly (we're demoingfor the World Health Organization sometime this week).



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Adding Fields to Documents with UnStored Fields - crazy scheme?

Reply via email to