1) That is my point. In this case, they are not copying the impl, they are requesting changes to the format.
I just think there are better ways of doing interoperability than file formats. In almost all cases where I've encountered (or built !) systems that did integration based on a known file format, it bit me in the ass in the end... and/or severely limited the ability of myself or others to change... -----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, May 11, 2006 2:29 PM To: java-dev@lucene.apache.org Subject: Re: Taking a step back I don't want to get into this (so I'm replying!?), but I just want to point out 2 things: 1) So far we've never had a situation where Java Lucene was held back because of interoperability. Ports tend to copy the implementation and adapt to Java Lucene. 2) Solr already does the HTTP server thing that you are describing, I believe. Otis ----- Original Message ---- From: Robert Engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, May 11, 2006 1:37:17 PM Subject: RE: Taking a step back I disagree with that a bit. I have found that certain languages lend themselves far better to certain file formats (that is, if an operation is very efficient to perform in a particular language, using a file format that allows the usage of that operation directly will often lead to much better performance). This is often true with byte ordering on particular hardware platforms. That is the whole reason this is an issue. Others can read the modified UTF, it is just not as efficient for them ! But more importantly, I don't think Lucene (or others) should be "held back" attempting to adhere to a standardized file format. Take databases for example. Many available. All use different file formats, but all can be accessed with (pretty much) standardized SQL (using different drivers). I think Lucene could offer a similar approach at the API level, maybe an embedded TCP/IP interface / command processor (similar to an HTTP server). You are always going to have interoperability issues (sometimes even when using Java, but rarely), so I say dump the burden on the others, and just make Lucene the best Java search engine possible. Without starting some sort of flame war, I can't think of any advantages to not running a Java version of Lucene, but, that is just my opinion. It would be fairly straight forward to convert all of Lucene to C, and provide a Java binding, but why??? -----Original Message----- From: Marvin Humphrey [mailto:[EMAIL PROTECTED] Sent: Thursday, May 11, 2006 12:08 PM To: java-dev@lucene.apache.org Subject: Re: Taking a step back On May 10, 2006, at 8:02 AM, Robert Engels wrote: > The file format issue whoever is a non-issue. If you want > interoperability > between systems do it via remote invocation and IIOP, or some HTTP > interface. This is far more easier to control, especially through > version > change cycles - otherwise all platforms need to be updated together > - which > is very hard to do (unless you are using Java with WORA !). > > I also don't understand why Lucene doesn't focus on being THE JAVA > search > engine. Anything I think that detracts that from moving forward > should be > out of scope. I really don't relish the prospect that this might degenerate into a language argument, but I think it falls to me to respond, since the patch I submitted on Monday opens up a lot of possibilities for interop. I don't necessarily disagree. Abandoning all attempts at interop has its advantages. One unfortunate albeit unavoidable aspect of Lucene is that it is tightly bound to its file format. In a perfect world, the file reading/ writing apparatus would be modular: the index would be read into memory using a plugin, manipulated, then saved using another plugin. That doesn't work, obviously, because indexes are commonly too large to be read into available RAM, and so the I/O stuff is scattered over the entire library, which makes maintaining compatibility laborious. However, Lucene has to make some effort to track its file format definition, so that it may live up to the commitments for backwards- compatibility codified earlier in this thread. This is currently done using the File Formats document (though that document is incomplete and buggy). There's not much difference between supporting the files written by an earlier version of Lucene and supporting the files written by another implementation of Lucene which adhere to the same spec. The only question is whether there are Java-specific optimizations which are so advantageous that they outweigh the benefits of interchange. There is no inherent advantage in using Modified UTF-8 over standard UTF-8, and the UTF-8 code I supplied actually speeds up Lucene by a couple percent because it simplifies some conditionals -- all of the performance hit comes from using a bytecount as the String prefix. I have good reasons to believe that this can go away, not the least of which is I've actually written a working implementation in Perl/C which uses bytecounts and I know where all the bottlenecks are. There are also advantages to keeping the file format public, both for Java Lucene and for the larger Apache Lucene project. Of course there's the the raw usefulness of interchange. For instance, it might be nice to whip up a little script in Perl or Ruby which works with your existing rig -- especially if there's a CPAN module that offers functionality you need which isn't available yet in Java, or you'd benefit from a near-instantaneous startup time. But more important, I'd argue, is that having all implementations share a common file format means that all the authors have an amplified interest in coordinating, communicating, and contributing. Just as learning new languages, programming or natural, broadens an individual's horizons, so does working out an implementation based on Lucene's data structures in another language lead to fresh thinking. The more cross-pollination of ideas from various authors and by proxy, their extended communities, the more all of the sub-projects gain and the faster Apache Lucene as a whole advances. Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]