I am currently working on migrating a project from an old version of Solr
to Elasticsearch, and came across a funny (to me at least) difference in
the "default" behavior of JapanesePartOfSpeechStopFilterFactory.

If JapanesePartOfSpeechStopFilterFactory is given empty args, it does
nothing. It doesn't load any stop tags, and just passes along the
TokenStream passed to create(). (By comparison, the Elasticsearch filter
will default to loading the stop tags shipped in the Kuromoji analyzer
JAR.) So, for many years, my project was not using
JapanesePartOfSpeechStopFilter, when I thought that it was.

I would like to create an issue and submit a patch, in case other users out
there are failing to use the filter factory correctly, but I'm not sure
what the best approach is, between:

1. If someone doesn't specify the tags argument, then throw an exception
(because the user probably doesn't know what they're doing).
2. If someone doesn't specify the tags argument, then load the default stop
tags (like JapaneseAnalyzer does).

I would lean more toward 1, to avoid a silent change in behavior.

Reply via email to