Hi Team,

Currently Oak uses a hard coded analyzer. As part of OAK-2177 this
need to be opened up for extension. Possible approaches

A - Create anayzer using reflection
------------------------------------------

Jackrabbit [1] used to support changing the analyzer. We can do the
same by capturing the analyzer class name and instantiate it

Pros : Simple to implement
Cons : Would not work well in OSGi

B - Make it configurable via content
----------------------------------------------

Elasticsearch provide a content based dsl to create analyzer [1]. We
can possibly implement something similar

Pros : End user usability improves quite a bit. No need to code, just configure!
Cons : Implementation complexity to support full configuration via content

C - Lookup Analyzer via OSGi
----------------------------------------

Make use of OSGi Service Registry to look analyzer by name. User
provides the analyzer as part of config

"title" : {
        "boost" : 1.5,
         "analyzer" : "AnalyzerA"
}

Oak can ship some default analyzer (lowercase etc) and others can be
looked from SR. For configuration we have two options

1. Provide an AnalyzerFactory - The factory can be provided with Index
definition nodestate corresponding to analyzer element. This can be
used to configure the analyzer say by reading stop word data from
content

2. No default support - Analyzer provider are expected to register the
analyzer fully configured. They can probably utilize the repository
api lookup config

Pros : Full extensibility Makes use of OSGi
Cons :
1. Need to export Lucene classes,
2. Deal with OSGi dynamic nature etc (we can simply throw exception if
analyzer is not found)

Chetan Mehrotra
[1] http://wiki.apache.org/jackrabbit/IndexingConfiguration#Index_Analyzers

Reply via email to