The tests will be for backwards compatibility with previous versions of
Lucene using the described process of including previous versioned encoded
serialized objects into the test code base.  Similar to how CFS index files
are included in the test code tree.

There is a an elegance to the RemoteSearcher type of code that allows one to
focus on their queries and algorithms and ignore the fact that they are
searching over N machines.

Protocol buffers seem okay.  However given the way that Lucene allows
customizations in things like SortComparatorSource I do not see how protocol
buffers can be used with custom Java classes in the same way Java
serialization works.  If in the future Lucene allows greater customization
such as with scorers, similarities and queries in Lucene 3.0 then marrying
the data with code in a grid environment using protocol buffers gets ugly.
Protocol buffers are nice and can be added to a distributed Lucene
environment, but the cost of implementing them vs. Serialization is much
higher.

Uber distributed search may not be the most common use case right now for
Lucene but as it improves it's capabilities then people will try to use
Lucene in a distributed grid environment.  One could conceivably execute
arbitrarily complex coordinated operations over the standard Lucene 3.0 APIs
without tearing down processes and other worries. Oracle has PL/SQL and
Lucene effectively operates using Java for customized query operations like
PL/SQL.  It would seem natural to at least support Java as a way to execute
customized queries.  The customized queries would be dynamically loaded Java
objects.

In the marketplace Lucene seems to be a good place to do realtime search
based data processing.  At least compared to Sphinx and MG4J.

A little further into the future with SSDs, it should be possible to perform
place replacement of inverted index data using Lucene (at which point it is
similar to a database) and the ability to execute remote code may be very
useful.  Hopefully the APIs for 3.0 will have a goal of being open enough
for this.


On Fri, Dec 5, 2008 at 2:40 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> Jason Rutherglen wrote:
>
>> I think it's best to implement Externalizable as long as someone is
>> willing to maintain it.  I commit to maintaining the Externalizable code.
>>
>
> We need to agree to maintain things as a community, not as individuals.  We
> can't rely on any particular individual being around in the future.
>
>  This will insure forward compatability between serialized versions, make
>> the serialized objects smaller, and make serialization faster.
>>
>
> If we want to promise compatibility we need to scope it and test it.  We
> cannot in good faith promise that Query will be serially compatible forever,
> nor should we make any promises that we don't test.  So if you choose to
> continue promoting this route, please specify the scope of compatibility and
> your plans to add tests for it.
>
>  Apparently it matters enough for Hadoop to implement Writeable in all over
>> the wire classes.
>>
>
> I'm not sure what you're saying here.  As I've said before, Hadoop is
> moving away from Writable because it is too fragile as classes change. As a
> part of the preparations for Hadoop 1.0 we are agreeing on serialization
> back-compatibility requirements and what technology we will use to support
> these.  Hadoop is at its core a distributed system, while Lucene is not.
>  Even then, Hadoop will continue to require that one update all nodes in a
> cluster in a coordinated manner, so only end-user protocols need be
> cross-version compatible, not internal protocols.  I do not yet see a strong
> analogy here.
>
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to