I agree with you characterization. We love the speed and performance of Lucene, but the updating process just doesn't feel right in that context.

I think the common use case that comes to mind is tagging a document. Every time a doc gets tagged, you have to rebuild it, or manage multiple indices, etc. The ParallelReader is supposed to help with the scenario to some extent, but it seems difficult to be able to maintain doc ids in sync.

I know the problem is hard and I don't know if it is solvable. I was just thinking that perhaps something like the Layers facility in photo editing software might be a good model to start from. Whereby we could "mask" the document somehow with the updated information. I haven't dug into the code to see how it would work.

I was also thinking about something like an "AsynchronousParallelReader" that took on construction the designation of the field that contains the OID and could manage where each document lives in each index and we could drop the doc ids in sync requirement of the ParallelReader at the expense of some extra work. Again, a hypothetical to optimize high update environments and I am not sure how fast it would be.

On May 8, 2007, at 2:24 AM, Chris Hostetter wrote:


: I am not sure I agree with that.

i don't think i understand what part you don't agree with :)

: Document management systems are quite common these days, and people
: are used to "checking out" a document, making changes, and checking
: the entire document back in.
:
: In many ways Lucene can be viewed as a self-contained document mngt
: system if you store every field.

agreed.

: If the user is savvy enough to 'rebuild' their documents from an
: external source, then the fields do not need to be stored (just the
: OID field for convenience).

it's this rebuilding that people tend to dislike about the delete/ re-add process that's currently neccessary to "update" a document in Lucene .. people don't wnat to have to be savvy enough to rebuild their documents from an external source, they want to throw a bunch of docs in, do some
searches, pull a doc out, modify one field and throw it back in again.

at least: that's how i would characterize most questions about "updating"
docs.

if the issue was just one of supporting an updateDoc(Document) method
where the client is expected to "rebuild" the entire doc before calling the
method, then we've already got that ... it's
IndexWriter.updateDocument(Term,Document).





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to