On Mon, Aug 25, 2008 at 5:37 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:
> > Is this the specific use case, that you want to handle composite words as > in javaFieldAndClassNames? There is no native support for that in Lucene to > my knowledge, but it should not be too hard to implement a TokenStream that > tokenize such composite words in to single tokens. You probably want to keep > the original token too though. > > Another alternative is creating an ngram index. > > Finally you might want to look at the org.apache.lucene.analysis.compound > package in contrib/analyzers. > > Solr has WordDelimiterFilter which splits on case transition (and many more). It is exposed through WordDelimiterFilterFactory. http://lucene.apache.org/solr/api/org/apache/solr/analysis/WordDelimiterFilterFactory.html http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/WordDelimiterFilter.java?revision=684908&view=markup -- Regards, Shalin Shekhar Mangar.