Sounds like the right approach! Perhaps, both ways should be allowed - the analyzer from the index is used by default but can be overriden explicitly in the API (not sure about the query parser though). The easiest usage pattern should then be to specify the analyzer once and use it going forward (for adding documents and for querying). But for special needs one could specify a different analyzer, in which case the programmer takes the responsibility for keeping the index/query results consistant. Another question is whether to serialize the analyzer or provide a factory that instantiates one by name and store only the name. Serializing would make index stores more portable across Lucene installations, since the analyzer class does not need to be present. But instantiating by name would allow analyzers that have non-serializable dependencies (for example, an analyzer that calls native WordNet API to expand synonyms). In our use, I don't see us moving index stores between Lucene installations that are configured with different classes in the class path. We do move them between similarly configured installations though. What about class versioning? I can't think of clear advantages one way or another, but it seems that it would be an issue to consider. ================================================ 1 (complicated way): When the index store is created, register an analyzer for each field (could be the same one.) A serialized copy of the analyzer is stored in the index base, and queries on that field are automatically processed with it. 2 (simpler, less complete way): Have a way of telling the query parser that "these fields use these analyzers", or at the very least, "these fields don't get tokenized with an analyzer." _______________________________________________ Lucene-dev mailing list [EMAIL PROTECTED] http://lists.sourceforge.net/lists/listinfo/lucene-dev
