On Mon, Aug 25, 2008 at 5:37 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:

>
> Is this the specific use case, that you want to handle composite words as
> in javaFieldAndClassNames? There is no native support for that in Lucene to
> my knowledge, but it should not be too hard to implement a TokenStream that
> tokenize such composite words in to single tokens. You probably want to keep
> the original token too though.
>
> Another alternative is creating an ngram index.
>
> Finally you might want to look at the org.apache.lucene.analysis.compound
> package in contrib/analyzers.
>
>
Solr has WordDelimiterFilter which splits on case transition (and many
more). It is exposed through WordDelimiterFilterFactory.

http://lucene.apache.org/solr/api/org/apache/solr/analysis/WordDelimiterFilterFactory.html
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/WordDelimiterFilter.java?revision=684908&view=markup

-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to