Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Andi Vajda wrote: I'd be interested in doing this but what is it that we're after in 'supporting gcj' actually ? I think it would sufficient to: 1. Compile only .jar and .class with gcj (not .java). 2. Pass all unit tests on a single platform. This would provide an existence proof that Lucene can run under GCJ, and doesn't require solving GCJ's porting issues. Even when only compiling .jar - .so with gcj, a number of patches still need to be applied: http://svn.osafoundation.org/pylucene/trunk/patches.lucene The patches to JavaCC-generated code should probably really become JavaCC patches. Have you looked into that? Most of the rest look like reasonable changes to Lucene, except perhaps the native matches, which looks a bit fishy for Lucene's trunk. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Tue, 11 Jul 2006, Doug Cutting wrote: Andi Vajda wrote: I'd be interested in doing this but what is it that we're after in 'supporting gcj' actually ? I think it would sufficient to: 1. Compile only .jar and .class with gcj (not .java). 2. Pass all unit tests on a single platform. Just last week, a PyLucene user got it to work on Solaris. I have no access to a Solaris machine to validate this. If I had my choice of platform, I'd pick one of (in order of preference): - Mac OS X (Intel or PPC) - a recent Red Hat Linux since this is the one most gcj developers use - Ubuntu 6.06 As for the version of gcj I'd suggest using: - Mac OS X Intel : gcj 4.0.2 (heavily patched) - Mac OS X PPC : gcj 3.4.6 - Red Hat Linux : I'd try 4.2.0 downgrading until I find one that works, probably 4.1.1 - Ubuntu 6.06: gcj 3.4.6 Unless junit can be made to run compiled under gcj, I see some more work on the unit tests side. This could be interesting too... Even when only compiling .jar - .so with gcj, a number of patches still need to be applied: http://svn.osafoundation.org/pylucene/trunk/patches.lucene The patches to JavaCC-generated code should probably really become JavaCC patches. Have you looked into that? Yes, I filed bug 53 almost two years ago, it's not gone very far :( https://javacc.dev.java.net/issues/show_bug.cgi?id=53 Most of the rest look like reasonable changes to Lucene, except perhaps the native matches, which looks a bit fishy for Lucene's trunk. The native match patches are required because the libgcj that comes with gcj 3.4.x doesn't provide a regular expressions implementation. This is solved in PyLucene by using python's. I think gcj 4 comes with regex support but gcj 4 is not yet well supported on most platforms. For the gcj platform story, see this pylucene-dev post I sent recently: http://lists.osafoundation.org/pipermail/pylucene-dev/2006-June/001106.html Andi.. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Andi Vajda wrote: Just last week, a PyLucene user got it to work on Solaris. I have no access to a Solaris machine to validate this. If I had my choice of platform, I'd pick one of (in order of preference): - Mac OS X (Intel or PPC) - a recent Red Hat Linux since this is the one most gcj developers use - Ubuntu 6.06 The Apache machine where we run nightly builds runs Solaris. My first platform of choice would be Ubuntu. Unless junit can be made to run compiled under gcj, I see some more work on the unit tests side. This could be interesting too... A search for gcj junit finds: http://www.mail-archive.com/user@ant.apache.org/msg19104.html Yes, I filed bug 53 almost two years ago, it's not gone very far :( https://javacc.dev.java.net/issues/show_bug.cgi?id=53 Probably this would get fixed more quickly if someone contributed a patch to JavaCC. Even it were not committed, we could build our own version of JavaCC. Any intrepid volunteers? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Jul 11, 2006, at 12:17 AM, Daniel John Debrunner wrote: Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? Seems potentially a little strange to me. Does this mean Lucene would be limited to the set of 1.5 features actually implemented by GCJ? So if there is a 1.5 feature that is not supported by GCJ (while others are) it cannot be used? Seems more natural to support the complete 1.5 as defined by Sun/Java, not the subset implemented by one open source compiler. Eclipse has a built in compiler called ecj and it can compile Java 1.6 code today. However, unless classes are provided at runtime for linking, one will get build errors. The same is true with gcj. It still does not fully support Java 1.4, (almost there...) classes, though it supports all language features. However, on Fedora, Eclipse is built with ecj and to me this demonstrates that it is close enough for most use cases. Gcj will have support for the language features before it supports all the new classes. In terms of Lucene, I believe that the most important classes that are wanted are the concurrency ones. (At least that is how I have read the posts here.) I think the measure of readiness is not that it compiles today with gcj, but that the Java 1.5 classes and features that are likely to be used by lucene are implemented and pass all lucene tests. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Jul 11, 2006, at 3:51 AM, Doug Cutting wrote: Andi Vajda wrote: I'd be interested in doing this but what is it that we're after in 'supporting gcj' actually ? I think it would sufficient to: 1. Compile only .jar and .class with gcj (not .java). 2. Pass all unit tests on a single platform. This would provide an existence proof that Lucene can run under GCJ, and doesn't require solving GCJ's porting issues. For me the platform of choice would be MacOS X, since 10.3 will never have Java 5. (IIRC, 10.4 has only been out for about a year.) Most of the other platforms will. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Tue, 11 Jul 2006, Doug Cutting wrote: Probably this would get fixed more quickly if someone contributed a patch to JavaCC. Even it were not committed, we could build our own version of JavaCC. Any intrepid volunteers? For patches that seem too kludgy to make it into Lucene's sources (for example, to work around the lack of proper exception support under Windows gcj in the query parser) a compromise could be to keep these patches in a separate file and apply them to the Lucene sources before building them with gcj. This is how PyLucene is built today. Some patches have already been incorporated into the Lucene sources (for example, in Searcher.java, to workaround gcj bug 15411). Of course, the long term goal should be to no longer have any patches at all. I've been working on PyLucene about two and a half years now and the number of patches has remained fairly stable. A nice side effect of trying to support gcj with Java Lucene by including it into the Lucene test framework could be that the gcj developers might be more inclined to taking a look at gcj-related issues that are thus made much easier to reproduce. Andi.. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Tue, 11 Jul 2006, DM Smith wrote: Eclipse has a built in compiler called ecj and it can compile Java 1.6 code today. However, unless classes are provided at runtime for linking, one will get build errors. It looks like ecj is going to replace the gcj java front-end compiler thereby making the 1.5 language features available to gcj. In the meantime, the classpath project is working towards adding support for all JRE classes. I'm quite optimistic that we should see a 1.5 capable gcj this year. This isn't saying much, however, about which platforms, besides Red Hat Linux, this gcj would be producing stable executables for. For example, gcj on Windows is very far behind and is getting very little development time these days. Andi.. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Tue, 11 Jul 2006, robert engels wrote: It's been years and GCJ still doesn't have anywhere near full 1.4 classpath libraries. So now if we want to write code for Lucene we have to know what libraries are available for GCJ? GCJ is a joke. It looks like classpath is quite close to 100% 1.4 JRE support. http://www.kaffe.org/~stuart/japi/htmlout/h-jdk14-classpath.html Of course, earlier gcj versions, such as 3.4.x, come with a libgcj based on an earlier version of classpath with bigger holes (regex support, for example). Things are moving in the right direction, however... Andi.. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Andi Vajda wrote: On Sat, 8 Jul 2006, Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? +1 If we use this criteria, then we should probably officially support GCJ. Ideally we should run nightly unit tests with GCJ. Andi, would you be interested in helping to set this up? Our unit test scripts are at: https://svn.apache.org/repos/asf/lucene/java/nightly/ These are run on lucene.zones.apache.org, a Solaris box. If you (or someone else) is willing, then I can make you an account on this machine and you can alter the nightly build process to include testing against the most recent GCJ release. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Andi Vajda wrote: On Mon, 10 Jul 2006, Doug Cutting wrote: Andi Vajda wrote: On Sat, 8 Jul 2006, Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? +1 If we use this criteria, then we should probably officially support GCJ. Ideally we should run nightly unit tests with GCJ. Andi, would you be interested in helping to set this up? This is interesting to me, is the nightly build environment difficult to replicate ? I'd be interested in doing this but what is it that we're after in 'supporting gcj' actually ? There is some advantage in using gcj as a measure of usability in the context of a free (as in beer) java, such that for a given target platform, one can deliver executables and shared libraries without requiring virtual machine runtimes. The second advantage is to give a simple method to nightly test contributions using new features. The third advantage seems to be a reduction in computational load on servers running native code. - running a fully compiled program linked against a lucene.so ? if so, which platforms ? the gcj story is very different on each and every platform, including different linuxes and gcj is not well supported on some platforms at all. This seems to be the case, since on an updated fedora core 5 with gcj (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1), the Makefile modifications required are trivial. - running java bytecode with the gcj VM (gij, I believe) ? if the .java code needs to be compiled with gcj then a number of patches still need to be applied against the Java lucene sources. PyLucene is built by compiling .java - .jar using a regular JDK (Apple's or Blackdown) and using gcj to compile from .jar - .so thereby working around all the gcj java front-end bugs Even when only compiling .jar - .so with gcj, a number of patches still need to be applied: http://svn.osafoundation.org/pylucene/trunk/patches.lucene The last time I checked for src/gcj/Makefile (revision 420696), all that was required was to fix the name of the lucene archive file to match what is actually generated, e.g., $(BUILD)/lucene-core-[0-9].*.jar and add the FieldCache* to the names to skip . . . Not having contributed to lucene yet, is it required to generate a 'patch' to add to jira, or is the following output from a simple `svn diff` sufficient for experimentation ? Index: src/gcj/Makefile === --- src/gcj/Makefile (revision 420696) +++ src/gcj/Makefile (working copy) @@ -8,7 +8,7 @@ CORE=$(BUILD)/classes/java SRC=. -CORE_OBJ:=$(subst .jar,.a,$(wildcard $(BUILD)/lucene-[0-9]*.jar)) +CORE_OBJ:=$(subst .jar,.a,$(wildcard $(BUILD)/lucene-core-[0-9]*.jar)) CORE_JAVA:=$(shell find $(ROOT)/src/java -name '*.java') CORE_HEADERS=\ @@ -55,7 +55,7 @@ # yet accept from .class files. # NOTE: Change when http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15501 is fixed. $(CORE_OBJ) : $(CORE_JAVA) - $(GCJ) $(GCJFLAGS) -c -I $(CORE) -o $@ `find $(ROOT)/src/java -name '*.java' -not -name '*Sort*' -not -name 'Span*'` `find $(CORE) -name '*.class' -name '*Sort*' -or -name 'Span*'` + $(GCJ) $(GCJFLAGS) -c -I $(CORE) -o $@ `find $(ROOT)/src/java -name '*.java' -not -name '*Sort*' -not -name 'Span*' -not -name 'FieldCache*'` `find $(CORE) -name '*.class' -name '*Sort*' -or -name 'Span*' -or -name 'FieldCache*'` # generate object code from jar files using gcj %.a : %.jar more, l8r, v -- The future is here. It's just not evenly distributed yet. -- William Gibson, quoted by Whitfield Diffie - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? Seems potentially a little strange to me. Does this mean Lucene would be limited to the set of 1.5 features actually implemented by GCJ? So if there is a 1.5 feature that is not supported by GCJ (while others are) it cannot be used? Seems more natural to support the complete 1.5 as defined by Sun/Java, not the subset implemented by one open source compiler. Dan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Agreed. I think those that are reliant on GCJ should plan on expending the effort to do whatever backporting is needed to make Lucene work on it. It should also be a GCJ branch or version. Seems silly to support 1.5 and not do it this way. On Jul 10, 2006, at 11:17 PM, Daniel John Debrunner wrote: Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? Seems potentially a little strange to me. Does this mean Lucene would be limited to the set of 1.5 features actually implemented by GCJ? So if there is a 1.5 feature that is not supported by GCJ (while others are) it cannot be used? Seems more natural to support the complete 1.5 as defined by Sun/Java, not the subset implemented by one open source compiler. Dan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
robert engels wrote: Seems silly to support 1.5 and not do it this way. Sometimes a little silliness is some serious fun! Just give me a rubber nose, since I am just clowning around trying to build Andi's kewly contrib/db using gcj on the slightly stylish db-4.4.20 and je-3.0.12 . . . On Jul 10, 2006, at 11:17 PM, Daniel John Debrunner wrote: Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? Seems potentially a little strange to me. Does this mean Lucene would be limited to the set of 1.5 features actually implemented by GCJ? So if there is a 1.5 feature that is not supported by GCJ (while others are) it cannot be used? Seems more natural to support the complete 1.5 as defined by Sun/Java, not the subset implemented by one open source compiler. Do you have a different favorite open source java compiler for 1.5 ? more, l8r, v -- The future is here. It's just not evenly distributed yet. -- William Gibson, quoted by Whitfield Diffie - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Vic Bancroft wrote: On Jul 10, 2006, at 11:17 PM, Daniel John Debrunner wrote: Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? Seems potentially a little strange to me. Does this mean Lucene would be limited to the set of 1.5 features actually implemented by GCJ? So if there is a 1.5 feature that is not supported by GCJ (while others are) it cannot be used? Seems more natural to support the complete 1.5 as defined by Sun/Java, not the subset implemented by one open source compiler. Do you have a different favorite open source java compiler for 1.5 ? No, I just think the platform for Lucene (or any Java project) should be defined by the spec (JDK 1.4, 1.5 or 1.6), not a single (possible partial) implementation of the spec. Dan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Chuck Williams wrote: I doubt any single contribution will change anyone's mind. I would like to have clarity on the 1.5 decision before deciding whether or not to contribute this and other things. My ParallelWriter contribution, which also requires 1.5, is already sitting in jira. Sitting in Jira is better than not sitting in Jira, no? I only work in 1.5 and use its features extensively. I don't think about 1.4 at all, and so have no idea how heavily dependent the code in question is on 1.5. Unfortunately, I won't be able to contribute anything substantial to Lucene so long as it has a 1.4 requirement. The 1.5 decision requires a consensus. You're making ultimatums, which does not help to build consensus. By stating an inflexible position you've become a fact that informs the process. I think we should try to minimize the number of inconvenienced people. Both developers and users are people. Some developers are happy to continue in 1.4, adding new features that users who are confined to 1.4 JVMs will be able to use. Other developers will only contribute 1.5 code, perhaps (unless we find a technical workaround) excluding users confined to 1.4 JVMs. But it is difficult to compare the inconvenience of a developer who refuses to code back-compatibly to a user who is deprived new features. Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Sat, 8 Jul 2006, Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? +1 Andi.. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Jul 8, 2006, at 12:41 PM, Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? I have been doing a bit of reading on GCJ compatibility. I think it is going to come in 2 parts: 1) It supports all the new language features of Java 1.5. 2) It has an implementation of all the new classes and methods that Lucene uses. For me the test is that it is released for MacOSX. With these three things, I'd be happy :) DM Smith, stick in the mud :) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Doug Cutting wrote on 07/08/2006 09:41 AM: Chuck Williams wrote: I only work in 1.5 and use its features extensively. I don't think about 1.4 at all, and so have no idea how heavily dependent the code in question is on 1.5. Unfortunately, I won't be able to contribute anything substantial to Lucene so long as it has a 1.4 requirement. The 1.5 decision requires a consensus. You're making ultimatums, which does not help to build consensus. By stating an inflexible position you've become a fact that informs the process. My statement was not intended as an ultimatum at all. Rather, it is simply a fact. I prefer to contribute to Lucene, but my workload simply does not allow time to be spent on backporting. I think we should try to minimize the number of inconvenienced people. Both developers and users are people. Some developers are happy to continue in 1.4, adding new features that users who are confined to 1.4 JVMs will be able to use. Other developers will only contribute 1.5 code, perhaps (unless we find a technical workaround) excluding users confined to 1.4 JVMs. But it is difficult to compare the inconvenience of a developer who refuses to code back-compatibly to a user who is deprived new features. Doug, respectfully, this issue is inflammatory in its nature. I've found a couple of your comments to be inflammatory, although I suspect you did not intend them that way. Specifically the term refuses above and your prior comment about considering use of your veto power if the committers were to vote to move to 1.5. I'm not refusing to do anything. I am overwhelmed in a crunch for the next several months and simply informing the community that I have code that others may find valuable that might be contributed, but that it requires 1.5 and that I cannot backport it. I cannot unilaterally decide to contribute the code, needing the agreement of the company I'm working for. They are only interested in the contribution if there is interest in having it in the core. These are simply facts. I suspect I'm not the only person in this kind of situation. Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? Seems like a reasonable compromise to me. If I had a vote on this it would be +1. Chuck - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
On Jul 8, 2006, at 12:56 PM, Chuck Williams wrote: I prefer to contribute to Lucene, but my workload simply does not allow time to be spent on backporting. I'll stand by my offer to do the backporting when it is possible and does not do violence to the implementation. I'd prefer to wait until the patch that is in Jira is ready to be applied. At that point post the request here and I'll see if it is doable. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Hi Chuck, I think bulk update would be good (although I'm not sure how it would be different from batching deletes and adds, but I'm sure there is a difference, or else you wouldn't have done it). Java 1.5 - no conclusion, but personally I felt: - no strong arguments for 1.4, only a few people argued for it - very little interest from 1.4 adversaries in helping with backporting to 1.4 or updating the build system to do the retro thing with 1.5 code So I think you should contribute your code. This will give us a real example of having something possibly valuable, and written with 1.5 features, so we can finalize 1.4 vs. 1.5 discussion, probably with a vote on lucene-dev. Otis - Original Message From: Chuck Williams [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Thursday, July 6, 2006 5:07:41 PM Subject: Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided) robert engels wrote on 07/06/2006 12:24 PM: I guess we just chose a much simpler way to do this... Even with you code changes, to see the modification made using the IndexWriter, it must be closed, and a new IndexReader opened. So a far simpler way is to get the collection of updates first, then using opened indexreader, for each doc in collection delete document using key endfor open indexwriter for each doc in collection add document endfor open indexreader I don't see how your way is any faster. You must always flush to disk and open the indexreader to see the changes. Bulk updates however require yet another approach. Sorry to change topics here, but I'm wondering if there was a final decision on the question of java 1.5 in the core. If I submitted a bulk update capability that required java 1.5, would it be eligible for inclusion in the core or not? Chuck - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
Otis, First let me say, I don't want to rehash the arguments for or against Java 1.5. We can all go back and read the last two major threads on the issue. I don't think there is anything new to say. However, I think statements like: no strong arguments (I think the arguments were reasonable) only a few people argued for it (Only a few argued against it) very little interest (Very few votes are on any Jira issue, so what does that say) adversaries (I am not an adversary, I am a very interested party with a personal interest in the outcome) are inflammatory. I am willing to do the back port if it is possible and if it does not do violence to the implementation. There are a number of patches sitting in Jira and it is not clear to me which are even close to being applied. I am not interested in doing work on patches that are old or might sit around for a while until they are applied (and therefore become out of sync). If the patches are identified as being worthy of being applied and are also identified as being Java 1.5, I will port it and it's test if it make sense. It has already been granted that contrib allow Java 1.5. So I presume that the build has been updated to allow for 1.5 in contrib and not in core. If this is not the case I think that the first committer (or submitter) of Java 1.5 code to contrib has the responsibility to change the build system (or at least ensure that it is done.) As to the build system, I am not the right person to see that it works. I am using Eclipse to do the builds. I maintain 2 workspaces, one with core only and that is Java 1.4.2 and the other is core and contrib and that is Java 1.5. I have done this so I can help back port to Java 1.4. However, I think you have identified that the core people need to make a decision and the rest of us need to go with it. So, I suggest that Doug convene such a meeting of the minds and communicate the decision to the rest of us. DM On Jul 7, 2006, at 1:17 PM, Otis Gospodnetic wrote: Hi Chuck, I think bulk update would be good (although I'm not sure how it would be different from batching deletes and adds, but I'm sure there is a difference, or else you wouldn't have done it). Java 1.5 - no conclusion, but personally I felt: - no strong arguments for 1.4, only a few people argued for it - very little interest from 1.4 adversaries in helping with backporting to 1.4 or updating the build system to do the retro thing with 1.5 code So I think you should contribute your code. This will give us a real example of having something possibly valuable, and written with 1.5 features, so we can finalize 1.4 vs. 1.5 discussion, probably with a vote on lucene-dev. Otis - Original Message From: Chuck Williams [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Thursday, July 6, 2006 5:07:41 PM Subject: Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided) robert engels wrote on 07/06/2006 12:24 PM: I guess we just chose a much simpler way to do this... Even with you code changes, to see the modification made using the IndexWriter, it must be closed, and a new IndexReader opened. So a far simpler way is to get the collection of updates first, then using opened indexreader, for each doc in collection delete document using key endfor open indexwriter for each doc in collection add document endfor open indexreader I don't see how your way is any faster. You must always flush to disk and open the indexreader to see the changes. Bulk updates however require yet another approach. Sorry to change topics here, but I'm wondering if there was a final decision on the question of java 1.5 in the core. If I submitted a bulk update capability that required java 1.5, would it be eligible for inclusion in the core or not? Chuck - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))
DM Smith wrote on 07/07/2006 07:07 PM: Otis, First let me say, I don't want to rehash the arguments for or against Java 1.5. This is an emotional issue for people on both sides. However, I think you have identified that the core people need to make a decision and the rest of us need to go with it. It would be most helpful to have clarity on this issue. On Jul 7, 2006, at 1:17 PM, Otis Gospodnetic wrote: Hi Chuck, I think bulk update would be good (although I'm not sure how it would be different from batching deletes and adds, but I'm sure there is a difference, or else you wouldn't have done it). Bulk update works by rewriting all segments that contain a document to be modified in a single linear pass. This is orders of magnitude faster than delete/add if the set of documents to be updated is large, especially if only a few small fields are mutable on Documents that have many possibly large immutable fields. E.g., on a somewhat slow development machine I updated several fields on 1,000,000 large documents in 43 seconds. There is an existing patch in jira that takes this same approach (LUCENE-382). However the limitations in that patch are substantial: only optimized indexes, stored fields are not updated, updates are independent of the existing field value, etc. These limitations make that implementation not suitable for many use cases. My implementation eliminates all of those limitations, providing a fast flexible solution for applying an arbitrary value transformation to selected documents and fields in the index (doc.field.new_value = f(doc, field.old_value, doc.other_field_values) for arbitrary f). It also works with ParallelReader (and the ParallelWriter I've already contributed). This allows the mutable fields to be segregated into a separate subindex. Only that subindex need be updated. This alone is an enormous advantage over a large number of delete/add's where the same optimization is not possible due to the doc-id synchronization requirements of ParallelReader. There is a substantial amount of code required to do this, and it is completely dependent on the index representation. To simplify merge issues with ongoing Lucene changes, I had to copy and edit certain private methods out of the existing index code (and make extensive use of the package-only api's). Beyond normal benefits of open sourcing code, my interest in contributing this is to see the index code refactored to take bulk update into account. This is increased by the current focus on a new flexible index representation. I would like to see bulk update as one of the operations supported in the new representation. So I think you should contribute your code. This will give us a real example of having something possibly valuable, and written with 1.5 features, so we can finalize 1.4 vs. 1.5 discussion, probably with a vote on lucene-dev. I doubt any single contribution will change anyone's mind. I would like to have clarity on the 1.5 decision before deciding whether or not to contribute this and other things. My ParallelWriter contribution, which also requires 1.5, is already sitting in jira. I only work in 1.5 and use its features extensively. I don't think about 1.4 at all, and so have no idea how heavily dependent the code in question is on 1.5. Unfortunately, I won't be able to contribute anything substantial to Lucene so long as it has a 1.4 requirement. Chuck - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]