Re: Various Ideas from ApacheCon

Grant Ingersoll Tue, 08 May 2007 04:10:49 -0700

I agree with you characterization. We love the speed andperformance of Lucene, but the updating process just doesn't feelright in that context.

I think the common use case that comes to mind is tagging adocument. Every time a doc gets tagged, you have to rebuild it, ormanage multiple indices, etc. The ParallelReader is supposed to helpwith the scenario to some extent, but it seems difficult to be ableto maintain doc ids in sync.

I know the problem is hard and I don't know if it is solvable. I wasjust thinking that perhaps something like the Layers facility inphoto editing software might be a good model to start from. Wherebywe could "mask" the document somehow with the updated information. Ihaven't dug into the code to see how it would work.

I was also thinking about something like an"AsynchronousParallelReader" that took on construction thedesignation of the field that contains the OID and could manage whereeach document lives in each index and we could drop the doc ids insync requirement of the ParallelReader at the expense of some extrawork. Again, a hypothetical to optimize high update environments andI am not sure how fast it would be.


On May 8, 2007, at 2:24 AM, Chris Hostetter wrote:

: I am not sure I agree with that.

i don't think i understand what part you don't agree with :)

: Document management systems are quite common these days, and people
: are used to "checking out" a document, making changes, and checking
: the entire document back in.
:
: In many ways Lucene can be viewed as a self-contained document mngt
: system if you store every field.

agreed.

: If the user is savvy enough to 'rebuild' their documents from an
: external source, then the fields do not need to be stored (just the
: OID field for convenience).
it's this rebuilding that people tend to dislike about the delete/re-addprocess that's currently neccessary to "update" a document inLucene ..people don't wnat to have to be savvy enough to rebuild theirdocumentsfrom an external source, they want to throw a bunch of docs in, dosome
searches, pull a doc out, modify one field and throw it back in again.
at least: that's how i would characterize most questions about"updating"
docs.

if the issue was just one of supporting an updateDoc(Document) method
where the client is expected to "rebuild" the entire doc beforecalling the
method, then we've already got that ... it's
IndexWriter.updateDocument(Term,Document).





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Various Ideas from ApacheCon

Reply via email to