Hi Team, Currently Oak uses a hard coded analyzer. As part of OAK-2177 this need to be opened up for extension. Possible approaches
A - Create anayzer using reflection ------------------------------------------ Jackrabbit [1] used to support changing the analyzer. We can do the same by capturing the analyzer class name and instantiate it Pros : Simple to implement Cons : Would not work well in OSGi B - Make it configurable via content ---------------------------------------------- Elasticsearch provide a content based dsl to create analyzer [1]. We can possibly implement something similar Pros : End user usability improves quite a bit. No need to code, just configure! Cons : Implementation complexity to support full configuration via content C - Lookup Analyzer via OSGi ---------------------------------------- Make use of OSGi Service Registry to look analyzer by name. User provides the analyzer as part of config "title" : { "boost" : 1.5, "analyzer" : "AnalyzerA" } Oak can ship some default analyzer (lowercase etc) and others can be looked from SR. For configuration we have two options 1. Provide an AnalyzerFactory - The factory can be provided with Index definition nodestate corresponding to analyzer element. This can be used to configure the analyzer say by reading stop word data from content 2. No default support - Analyzer provider are expected to register the analyzer fully configured. They can probably utilize the repository api lookup config Pros : Full extensibility Makes use of OSGi Cons : 1. Need to export Lucene classes, 2. Deal with OSGi dynamic nature etc (we can simply throw exception if analyzer is not found) Chetan Mehrotra [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration#Index_Analyzers