The tests will be for backwards compatibility with previous versions of Lucene using the described process of including previous versioned encoded serialized objects into the test code base. Similar to how CFS index files are included in the test code tree.
There is a an elegance to the RemoteSearcher type of code that allows one to focus on their queries and algorithms and ignore the fact that they are searching over N machines. Protocol buffers seem okay. However given the way that Lucene allows customizations in things like SortComparatorSource I do not see how protocol buffers can be used with custom Java classes in the same way Java serialization works. If in the future Lucene allows greater customization such as with scorers, similarities and queries in Lucene 3.0 then marrying the data with code in a grid environment using protocol buffers gets ugly. Protocol buffers are nice and can be added to a distributed Lucene environment, but the cost of implementing them vs. Serialization is much higher. Uber distributed search may not be the most common use case right now for Lucene but as it improves it's capabilities then people will try to use Lucene in a distributed grid environment. One could conceivably execute arbitrarily complex coordinated operations over the standard Lucene 3.0 APIs without tearing down processes and other worries. Oracle has PL/SQL and Lucene effectively operates using Java for customized query operations like PL/SQL. It would seem natural to at least support Java as a way to execute customized queries. The customized queries would be dynamically loaded Java objects. In the marketplace Lucene seems to be a good place to do realtime search based data processing. At least compared to Sphinx and MG4J. A little further into the future with SSDs, it should be possible to perform place replacement of inverted index data using Lucene (at which point it is similar to a database) and the ability to execute remote code may be very useful. Hopefully the APIs for 3.0 will have a goal of being open enough for this. On Fri, Dec 5, 2008 at 2:40 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Jason Rutherglen wrote: > >> I think it's best to implement Externalizable as long as someone is >> willing to maintain it. I commit to maintaining the Externalizable code. >> > > We need to agree to maintain things as a community, not as individuals. We > can't rely on any particular individual being around in the future. > > This will insure forward compatability between serialized versions, make >> the serialized objects smaller, and make serialization faster. >> > > If we want to promise compatibility we need to scope it and test it. We > cannot in good faith promise that Query will be serially compatible forever, > nor should we make any promises that we don't test. So if you choose to > continue promoting this route, please specify the scope of compatibility and > your plans to add tests for it. > > Apparently it matters enough for Hadoop to implement Writeable in all over >> the wire classes. >> > > I'm not sure what you're saying here. As I've said before, Hadoop is > moving away from Writable because it is too fragile as classes change. As a > part of the preparations for Hadoop 1.0 we are agreeing on serialization > back-compatibility requirements and what technology we will use to support > these. Hadoop is at its core a distributed system, while Lucene is not. > Even then, Hadoop will continue to require that one update all nodes in a > cluster in a coordinated manner, so only end-user protocols need be > cross-version compatible, not internal protocols. I do not yet see a strong > analogy here. > > > Doug > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >