Opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-3589, which
also lists a couple other related mailing list posts.
On Thu, Jun 28, 2012 at 12:18 PM, Tom Burton-West wrote:
> Hello,
>
> My previous e-mail with a CJK example has received no replies. I
> verified that this problem also occurs for English. For example in the
> case of the word "fire-fly" , The ICUTokenizer and the WordDelimeterFilter
> both split this into two tokens "fire" and "fly".
>
> With an edismax query and a must match of 2 : q={!edsmax mm=2} if the
> words are entered separately at [fire fly], the edismax parser honors the
> mm parameter and does the equivalent of a Boolean AND query. However if
> the words are entered as a hypenated word [fire-fly], the tokenizer splits
> these into two tokens "fire" and "fly" and the edismax parser does the
> equivalent of a Boolean OR query.
>
> I'm not sure I understand the output of the debugQuery, but judging by the
> number of hits returned it appears that edismax is not honoring the mm
> parameter. Am I missing something, or is this a bug?
>
> I'd like to file a JIRA issue, but want to find out if I am missing
> something here.
>
> Details of several queries are appended below.
>
> Tom Burton-West
>
> edismax query mm=2 query with hypenated word [fire-fly]
>
>
> {!edismax mm=2}fire-fly
> {!edismax mm=2}fire-fly
> +DisjunctionMaxQuery(((ocr:fire ocr:fly)))
> +((ocr:fire ocr:fly))
>
>
> Entered as separate words [fire fly] numFound="184962
> edismax mm=2
>
> {!edismax mm=2}fire fly
> {!edismax mm=2}fire fly
>
> +((DisjunctionMaxQuery((ocr:fire)) DisjunctionMaxQuery((ocr:fly)))~2)
>
>
> Regular Boolean AND query: [fire AND fly] numFound="184962
> fire AND fly
> fire AND fly
> +ocr:fire +ocr:fly
> +ocr:fire +ocr:fly
>
> Regular Boolean OR query: fire OR fly 366047 numFound="366047"
>
> fire OR fly
> fire OR fly
> ocr:fire ocr:fly
> ocr:fire ocr:fly
>