On Fri, May 22, 2009 at 10:40:03PM +0400, Earwin Burrfoot wrote:
> >> Custom analyzers.
> > No problem.
> How are they recorded in the index?

Analyzers must implement dump() and load(), which convert the Analyzer to/from
a JSON-izable data structure.  They end up as JSON in
index_dir/schema_NNN.json.

Custom subclasses must be loaded by whatever app wants to read the index,
naturally.

> >> Intentionally different analyzers for indexing and searching.
> > No problem.  That only makes sense in the context of QueryParser, and the KS
> > QueryParser allows you to supply an analyzer which overrides the Schema.
> But well, it differs from analyzer used for indexation in one or two
> options, and shares a heap of others.

A constructor argument solves that problem, doesn't it?  Am I missing
something?

> >> Using this analyzer without any index at all - like I do highlight on
> >> a separate machine to minimize GC pauses, or tag docs by running a
> >> heap of queries against MemoryIndex.
> > No problem.  Distribute a Schema subclass among several machines.
> You mean read an index on one machine, create Analyzer, serialize it
> and send over the wire to other machines? I hope that's either a joke
> or I misunderstood you.

Please.  

How did your Analyzer class get on the other machines?  Do the same thing with
your Schema subclass.

> Storing a list of stopwords in the index sounds fun. Storing a fat
> synonym/morphology dictionary while completely analogous, is no longer
> fun.

So, don't store that whole dictionary in the serialized Analyzer -- just store
a version number.  Make the synonym data class data.  

If it's reasonable to key multiple versions of the class data off of the
version number constructor argument, do that.  If not and an index was built
with an version of the Analyzer that is no longer supported, either throw an
exception or intentionally ignore the mismatch and serve screwed up search
results.  Your call.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to