On 09/02/2012 02:49, Marvin Humphrey wrote:
After reviewing the Lucy::Simple code, I realized that we can avoid breaking
compat with only a few extra lines.

   * If the index exists during new(), extract the schema and type from what's
     on disk.
   * Otherwise, create a new EasyAnalyzer for the type.

That way, we avoid a schema conflict crash when indexes built by Lucy::Simple
prior to 0.4.0 are read by 0.4.0 or above.

That's a good idea.

However, CaseFolder and Normalizer presumably have slightly different case
mappings, thus the subclassing change is a back compat break.  It shouldn't be
a horrible break (depending on how close the mappings are) because it will
only affect search-time, screwing up the results only for terms which contain
code points whose mapping has changed.

The German sharp s ("ß") is handled differently by the CaseFolder and the Normalizer. The CaseFolder leaves it untouched, whereas the Normalizer converts it to "ss". Fortunately, the snowball stemmer also converts sharp s to "ss", so many users should be fine.

I don't think we should outright remove CaseFolder without a really good
reason, because that will force almost all of our users to change their code
and then reindex from scratch.  But a subtle compat break might be OK,
especially since you can update all the docs in place after upgrading and only
suffer during a window of time from slightly degraded search results.

+1

Nick

Reply via email to