On Jan 14, 2006, at 5:45 PM, Robert Kirchgessner wrote:

Well, I thing merging segments should be possible only if
the field definitions are consistent throughout the segments.
Merging inconsistent segments looks for me like an error at worst
and bad design at least. But I may just not have met an
appropriate use case yet...

Lucene allows the user to change field definitions on the fly. That's like an SQL database which auto-adapts the table definition with each INSERT. It's impressive that Lucene can do that, but look under the hood and you'll see that it ain't easy, or cheap.

Significant, probably substantial performance gains are possible if field definitions are frozen per-IndexWriter. That's what KinoSearch does, and it's the primary reason that it's an order of magnitude faster than Plucene at building indexes.

The only problem I see with freezing field definitions per-index is for document collections that are difficult to re-index from scratch, but that might require occasional field-definition changes. I suspect that's an edge case. Anyone care to disabuse me of that notion? Probably some people who are doing large-scale web-spidering would be impacted.

My radical suggestion:

    * Require fields to be defined when the index
      is first created.
    * Store field definitions in a single per-index,
      human-readable file.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to