On Mar 13, 2007, at 2:38 AM, Michael Busch wrote:

Global field semantics make our life with FI much easier in a single index. But even with global field semantics we would have the same problem with the IndexWriter.addIndexes() method, no? I'm curious about how you solved that conflict in KinoSearch?

I didn't.

The KinoSearch equivalent of IndexWriter.addIndexes() fails if either you attempt to add an index created using a different subclass of Schema, or if any mismatches are detected when comparing field name => spec pairings. No conflict resolution is attempted -- only validation.

By committing to resolving all field property conflicts, Lucene creates two problems for itself.

First, there's the burden of writing, maintaining, and using the conflict resolution code for each property. Sometimes this code is problematic, as illustrated by a Michael McCandless post to java-user from this morning:

  Note, however, that you must do this for all Field instances by that
  same field name because whenever Lucene merges segments, if even one
  Document did not disable norms then this will "spread" so that all
  documents keep their norms, for the same field name.

Second, Lucene limits the kinds of properties that may be attached to field names to those where conflict resolution is possible, and which may be expressed entirely via a single boolean value. If you want to hang more sophisticated semantics off of field names, it is necessary to apply ad-hoc solutions outside the system: PerFieldAnalyzerWrapper, subclassing Similarity and making lengthNorm () polymorphic depending on field name, etc.

Things get easier to control, grok, and extend if all per-field behaviors are determined by a single class rather than spread out. An Analyzer spec can be associated with a field name permanently, eliminating analyzer mismatches. So can a Similarity implementation... soon, a posting format.

Every feature that accumulates adds to the pressure on Lucene's conflict resolution system and acts as a drag on innovation (because we are reluctant to complicate the interface further, as Yonik was with segOmitNorms). By trading away a certain amount of flexibility with regards to what properties may be hung off of individual field values, that pressure is released, and we get a simplified code base and increased freedom to hang a greater diversity of properties off of individual field names.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to