DM Smith wrote on 07/07/2006 07:07 PM: > Otis, > First let me say, I don't want to rehash the arguments for or > against Java 1.5.
This is an emotional issue for people on both sides. > However, I think you have identified that the core people need to > make a decision and the rest of us need to go with it. It would be most helpful to have clarity on this issue. > On Jul 7, 2006, at 1:17 PM, Otis Gospodnetic wrote: > >> Hi Chuck, >> >> I think bulk update would be good (although I'm not sure how it would >> be different from batching deletes and adds, but I'm sure there is a >> difference, or else you wouldn't have done it). Bulk update works by rewriting all segments that contain a document to be modified in a single linear pass. This is orders of magnitude faster than delete/add if the set of documents to be updated is large, especially if only a few small fields are mutable on Documents that have many possibly large immutable fields. E.g., on a somewhat slow development machine I updated several fields on 1,000,000 large documents in 43 seconds. There is an existing patch in jira that takes this same approach (LUCENE-382). However the limitations in that patch are substantial: only optimized indexes, stored fields are not updated, updates are independent of the existing field value, etc. These limitations make that implementation not suitable for many use cases. My implementation eliminates all of those limitations, providing a fast flexible solution for applying an arbitrary value transformation to selected documents and fields in the index (doc.field.new_value = f(doc, field.old_value, doc.other_field_values) for arbitrary f). It also works with ParallelReader (and the ParallelWriter I've already contributed). This allows the mutable fields to be segregated into a separate subindex. Only that subindex need be updated. This alone is an enormous advantage over a large number of delete/add's where the same optimization is not possible due to the doc-id synchronization requirements of ParallelReader. There is a substantial amount of code required to do this, and it is completely dependent on the index representation. To simplify merge issues with ongoing Lucene changes, I had to copy and edit certain private methods out of the existing index code (and make extensive use of the package-only api's). Beyond normal benefits of open sourcing code, my interest in contributing this is to see the index code refactored to take bulk update into account. This is increased by the current focus on a new flexible index representation. I would like to see bulk update as one of the operations supported in the new representation. >> So I think you should contribute your code. This will give us a real >> example of having something possibly valuable, and written with 1.5 >> features, so we can finalize 1.4 vs. 1.5 discussion, probably with a >> vote on lucene-dev. I doubt any single contribution will change anyone's mind. I would like to have clarity on the 1.5 decision before deciding whether or not to contribute this and other things. My ParallelWriter contribution, which also requires 1.5, is already sitting in jira. I only work in 1.5 and use its features extensively. I don't think about 1.4 at all, and so have no idea how heavily dependent the code in question is on 1.5. Unfortunately, I won't be able to contribute anything substantial to Lucene so long as it has a 1.4 requirement. Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]