On Mar 13, 2007, at 2:38 AM, Michael Busch wrote:
Global field semantics make our life with FI much easier in a
single index. But even with global field semantics we would have
the same problem with the IndexWriter.addIndexes() method, no? I'm
curious about how you solved that conflict in KinoSearch?
I didn't.
The KinoSearch equivalent of IndexWriter.addIndexes() fails if either
you attempt to add an index created using a different subclass of
Schema, or if any mismatches are detected when comparing field name
=> spec pairings. No conflict resolution is attempted -- only
validation.
By committing to resolving all field property conflicts, Lucene
creates two problems for itself.
First, there's the burden of writing, maintaining, and using the
conflict resolution code for each property. Sometimes this code is
problematic, as illustrated by a Michael McCandless post to java-user
from this morning:
Note, however, that you must do this for all Field instances by that
same field name because whenever Lucene merges segments, if even one
Document did not disable norms then this will "spread" so that all
documents keep their norms, for the same field name.
Second, Lucene limits the kinds of properties that may be attached to
field names to those where conflict resolution is possible, and which
may be expressed entirely via a single boolean value. If you want to
hang more sophisticated semantics off of field names, it is necessary
to apply ad-hoc solutions outside the system:
PerFieldAnalyzerWrapper, subclassing Similarity and making lengthNorm
() polymorphic depending on field name, etc.
Things get easier to control, grok, and extend if all per-field
behaviors are determined by a single class rather than spread out.
An Analyzer spec can be associated with a field name permanently,
eliminating analyzer mismatches. So can a Similarity
implementation... soon, a posting format.
Every feature that accumulates adds to the pressure on Lucene's
conflict resolution system and acts as a drag on innovation (because
we are reluctant to complicate the interface further, as Yonik was
with segOmitNorms). By trading away a certain amount of flexibility
with regards to what properties may be hung off of individual field
values, that pressure is released, and we get a simplified code base
and increased freedom to hang a greater diversity of properties off
of individual field names.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]