On 09/02/2012 02:49, Marvin Humphrey wrote:
After reviewing the Lucy::Simple code, I realized that we can avoid breaking compat with only a few extra lines.* If the index exists during new(), extract the schema and type from what's on disk. * Otherwise, create a new EasyAnalyzer for the type. That way, we avoid a schema conflict crash when indexes built by Lucy::Simple prior to 0.4.0 are read by 0.4.0 or above.
That's a good idea.
However, CaseFolder and Normalizer presumably have slightly different case mappings, thus the subclassing change is a back compat break. It shouldn't be a horrible break (depending on how close the mappings are) because it will only affect search-time, screwing up the results only for terms which contain code points whose mapping has changed.
The German sharp s ("ß") is handled differently by the CaseFolder and the Normalizer. The CaseFolder leaves it untouched, whereas the Normalizer converts it to "ss". Fortunately, the snowball stemmer also converts sharp s to "ss", so many users should be fine.
I don't think we should outright remove CaseFolder without a really good reason, because that will force almost all of our users to change their code and then reindex from scratch. But a subtle compat break might be OK, especially since you can update all the docs in place after upgrading and only suffer during a window of time from slightly degraded search results.
+1 Nick
