[jira] [Commented] (SOLR-4153) eDismax: Misinterpretation of hyphens

2012-12-13 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530805#comment-13530805
 ] 

Leonhard Maylein commented on SOLR-4153:


Thanks. I have not noticed the change of the default for 
autoGeneratePhraseQueries 
in schema version 1.4. My bad.


 eDismax: Misinterpretation of hyphens
 -

 Key: SOLR-4153
 URL: https://issues.apache.org/jira/browse/SOLR-4153
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 The eDismax parser treats hyphens as OR operator:
 q: 
   british history 1815-1914
 qf: 
   ti sw
 Parsed as:
 (+((DisjunctionMaxQuery((sw:british | ti:british)) 
 DisjunctionMaxQuery((sw:history | ti:history)) DisjunctionMaxQuery(((sw:1815 
 sw:1914) | (ti:1815 ti:1914~3))/no_coord
 What is the reason for this behavior? Wouldn't it be better
 to treat 'term1-term2' as a PhraseQuery term1 term2 (as the 
 WordDelimiterFilter does)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4141) EDismax: Strange combination of subqueries with parentheses

2012-12-12 Thread Leonhard Maylein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonhard Maylein updated SOLR-4141:
---

Issue Type: Improvement  (was: Bug)

 EDismax: Strange combination of subqueries with parentheses
 ---

 Key: SOLR-4141
 URL: https://issues.apache.org/jira/browse/SOLR-4141
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 fi = field name, mm=100% (all examples)
 The query 'fi:a fi:b'
 (parsed query: '(+((fi:a fi:b)~2))/no_coord')
 is interpreted as 'fi:a AND fi:b'
 This also applies to the queries '(fi:a fi:b)' respectively 
 'fi:(a b)'.
 But the query '(fi:a fi:b) (fi:a fi:b)'
 (parsed query: '(+(((fi:a fi:b) (fi:a fi:b))~2))/no_coord')
 shows the same result as 'fi:a OR fi:b'.
 I'm not sure but I think this is a bug, isn't it?
 If it's a intended behavior I think it is very difficult
 to explain this to a searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4141) EDismax: Strange combination of subqueries with parentheses

2012-12-12 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529864#comment-13529864
 ] 

Leonhard Maylein commented on SOLR-4141:


In combination with the WordDelimiterFilter setting of mm to 100%
is worthless even if you do not have explicit sub-queries because
of the implicit sub-queries for search terms splitted up by the
WordDelimiterFilter (camel case words, words with hyphens or
letters followed by a digit).

I have changed the Type of this issue from bug to improvement.

 EDismax: Strange combination of subqueries with parentheses
 ---

 Key: SOLR-4141
 URL: https://issues.apache.org/jira/browse/SOLR-4141
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 fi = field name, mm=100% (all examples)
 The query 'fi:a fi:b'
 (parsed query: '(+((fi:a fi:b)~2))/no_coord')
 is interpreted as 'fi:a AND fi:b'
 This also applies to the queries '(fi:a fi:b)' respectively 
 'fi:(a b)'.
 But the query '(fi:a fi:b) (fi:a fi:b)'
 (parsed query: '(+(((fi:a fi:b) (fi:a fi:b))~2))/no_coord')
 shows the same result as 'fi:a OR fi:b'.
 I'm not sure but I think this is a bug, isn't it?
 If it's a intended behavior I think it is very difficult
 to explain this to a searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4153) eDismax: Misinterpretation of hyphens

2012-12-12 Thread Leonhard Maylein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonhard Maylein updated SOLR-4153:
---

Issue Type: Improvement  (was: Bug)

 eDismax: Misinterpretation of hyphens
 -

 Key: SOLR-4153
 URL: https://issues.apache.org/jira/browse/SOLR-4153
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 The eDismax parser treats hyphens as OR operator:
 q: 
   british history 1815-1914
 qf: 
   ti sw
 Parsed as:
 (+((DisjunctionMaxQuery((sw:british | ti:british)) 
 DisjunctionMaxQuery((sw:history | ti:history)) DisjunctionMaxQuery(((sw:1815 
 sw:1914) | (ti:1815 ti:1914~3))/no_coord
 What is the reason for this behavior? Wouldn't it be better
 to treat 'term1-term2' as a PhraseQuery term1 term2 (as the 
 WordDelimiterFilter does)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4153) eDismax: Misinterpretation of hyphens

2012-12-12 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529868#comment-13529868
 ] 

Leonhard Maylein commented on SOLR-4153:


This problem is related to SOLR-4141.

 eDismax: Misinterpretation of hyphens
 -

 Key: SOLR-4153
 URL: https://issues.apache.org/jira/browse/SOLR-4153
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 The eDismax parser treats hyphens as OR operator:
 q: 
   british history 1815-1914
 qf: 
   ti sw
 Parsed as:
 (+((DisjunctionMaxQuery((sw:british | ti:british)) 
 DisjunctionMaxQuery((sw:history | ti:history)) DisjunctionMaxQuery(((sw:1815 
 sw:1914) | (ti:1815 ti:1914~3))/no_coord
 What is the reason for this behavior? Wouldn't it be better
 to treat 'term1-term2' as a PhraseQuery term1 term2 (as the 
 WordDelimiterFilter does)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4160) eDismax should not split search terms between letters and digits

2012-12-12 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529870#comment-13529870
 ] 

Leonhard Maylein commented on SOLR-4160:


This problem is related to SOLR-4141.

 eDismax should not split search terms between letters and digits
 

 Key: SOLR-4160
 URL: https://issues.apache.org/jira/browse/SOLR-4160
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 The eDismax handler parses the query
 is:038729080x into
 +((is:038729080 is:x)~2)
 The query parser should not separate camel
 case words or mixtures of letters and digits.
 This is the job of the analyzers.
 Otherwise there are special types of data
 (like isbn or issn numbers) which could not be
 searched via the eDismax query parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4160) eDismax should not split search terms between letters and digits

2012-12-11 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528782#comment-13528782
 ] 

Leonhard Maylein commented on SOLR-4160:


Yes, you are right, it's the WordDelimiterFilter. Sorry.

We've never had this problem before because without
edismax both parts are combined with AND. Now, with edismax,
this is a OR combination :-(

 eDismax should not split search terms between letters and digits
 

 Key: SOLR-4160
 URL: https://issues.apache.org/jira/browse/SOLR-4160
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 The eDismax handler parses the query
 is:038729080x into
 +((is:038729080 is:x)~2)
 The query parser should not separate camel
 case words or mixtures of letters and digits.
 This is the job of the analyzers.
 Otherwise there are special types of data
 (like isbn or issn numbers) which could not be
 searched via the eDismax query parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4160) eDismax should not split search terms between letters and digits

2012-12-10 Thread Leonhard Maylein (JIRA)
Leonhard Maylein created SOLR-4160:
--

 Summary: eDismax should not split search terms between letters and 
digits
 Key: SOLR-4160
 URL: https://issues.apache.org/jira/browse/SOLR-4160
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein


The eDismax handler parses the query
is:038729080x into
+((is:038729080 is:x)~2)

The query parser should not separate camel
case words or mixtures of letters and digits.
This is the job of the analyzers.

Otherwise there are special types of data
(like isbn or issn numbers) which could not be
searched via the eDismax query parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4153) eDismax: Misinterpretation of hyphens

2012-12-07 Thread Leonhard Maylein (JIRA)
Leonhard Maylein created SOLR-4153:
--

 Summary: eDismax: Misinterpretation of hyphens
 Key: SOLR-4153
 URL: https://issues.apache.org/jira/browse/SOLR-4153
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein


The eDismax parser treats hyphens as OR operator:

q: 
  british history 1815-1914
qf: 
  ti sw

Parsed as:

(+((DisjunctionMaxQuery((sw:british | ti:british)) 
DisjunctionMaxQuery((sw:history | ti:history)) DisjunctionMaxQuery(((sw:1815 
sw:1914) | (ti:1815 ti:1914~3))/no_coord

What is the reason for this behavior? Wouldn't it be better
to treat 'term1-term2' as a PhraseQuery term1 term2 (as the 
WordDelimiterFilter does)?



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4141) EDismax: Strange combination of subqueries with parentheses

2012-12-04 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509590#comment-13509590
 ] 

Leonhard Maylein commented on SOLR-4141:


Oh, what a pity!
Then I have to check to user query in the front end first to
decide if the edismax parser is applicable or not.
If the user types in parentheses, there is no chance to use the
edismax handler because the user would be very surprised by
the the result.
Or, perhaps, I have to prefix every word within the parentheses
with a plus sign (and if there are camel case words I have to
separate the word parts before). But I think this is not the
whole purpose of the edismax query parser.

 EDismax: Strange combination of subqueries with parentheses
 ---

 Key: SOLR-4141
 URL: https://issues.apache.org/jira/browse/SOLR-4141
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein

 fi = field name, mm=100% (all examples)
 The query 'fi:a fi:b'
 (parsed query: '(+((fi:a fi:b)~2))/no_coord')
 is interpreted as 'fi:a AND fi:b'
 This also applies to the queries '(fi:a fi:b)' respectively 
 'fi:(a b)'.
 But the query '(fi:a fi:b) (fi:a fi:b)'
 (parsed query: '(+(((fi:a fi:b) (fi:a fi:b))~2))/no_coord')
 shows the same result as 'fi:a OR fi:b'.
 I'm not sure but I think this is a bug, isn't it?
 If it's a intended behavior I think it is very difficult
 to explain this to a searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4141) EDismax: Strange combination of subqueries with parentheses

2012-12-03 Thread Leonhard Maylein (JIRA)
Leonhard Maylein created SOLR-4141:
--

 Summary: EDismax: Strange combination of subqueries with 
parentheses
 Key: SOLR-4141
 URL: https://issues.apache.org/jira/browse/SOLR-4141
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein


fi = field name, mm=100% (all examples)

The query 'fi:a fi:b'
(parsed query: '(+((fi:a fi:b)~2))/no_coord')
is interpreted as 'fi:a AND fi:b'

This also applies to the queries '(fi:a fi:b)' respectively 
'fi:(a b)'.

But the query '(fi:a fi:b) (fi:a fi:b)'
(parsed query: '(+(((fi:a fi:b) (fi:a fi:b))~2))/no_coord')
shows the same result as 'fi:a OR fi:b'.

I'm not sure but I think this is a bug, isn't it?
If it's a intended behavior I think it is very difficult
to explain this to a searcher.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4130) eDismax: Terms are skipped for phrase boost when using parenthese

2012-11-30 Thread Leonhard Maylein (JIRA)
Leonhard Maylein created SOLR-4130:
--

 Summary: eDismax: Terms are skipped for phrase boost when using 
parenthese
 Key: SOLR-4130
 URL: https://issues.apache.org/jira/browse/SOLR-4130
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Leonhard Maylein


I've tried the following combination with the eDismax handler
in SOLR 4.0.0:

q: +sw(a b) +ti:(c d)
qf: freitext exttext^0.5
pf: freitext^6 exttext^3

The result is:

str name=rawquerystring+sw:(a b) +ti:(c d)/str

str name=querystring+sw:(a b) +ti:(c d)/str

str name=parsedquery(((sw:a sw:b) +(ti:c ti:d)) 
DisjunctionMaxQuery((freitext:b d^6.0)) DisjunctionMaxQuery((exttext:b 
d^3.0)))/no_coord/str

All terms are (equally) qualified by a field (field sw for the terms a and b, 
field ti for the terms c and d).
Why do the eDismax handler only use the terms b and d to build the phrase boost 
query?
It appears that some terms have been skipped for phrase boost.

Moreover, in my opinion, fielded terms should not be used in phrase boost 
except for the specified field.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-11-28 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505370#comment-13505370
 ] 

Leonhard Maylein commented on SOLR-3377:


I do not agree that this issue is solved.

I've tried the following combination with SOLR 4.0.0

q: +sw(a b) +ti:(c d)
qf: freitext exttext^0.5
pf: freitext^6 exttext^3

The result is:

str name=rawquerystring+sw:(a b) +ti:(c d)/str

str name=querystring+sw:(a b) +ti:(c d)/str

str name=parsedquery(+(+(sw:a sw:b) +(ti:c ti:d)) 
DisjunctionMaxQuery((freitext:b d^6.0)) DisjunctionMaxQuery((exttext:b 
d^3.0)))/no_coord/str

There should be no splitting on the qf/pf fields and therefore no 
DisjunctionMaxQueries.

The query '+(sw:a sw:b) +(ti:c ti:d)' works as expected.

 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0-BETA

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser

2012-11-28 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505374#comment-13505374
 ] 

Leonhard Maylein commented on SOLR-2368:


Please consider to also incorporate SOLR-3377 which is marked as fixed but it 
is not completely solved (see my comment on SOLR-3377).

 Improve extended dismax (edismax) parser
 

 Key: SOLR-2368
 URL: https://issues.apache.org/jira/browse/SOLR-2368
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Yonik Seeley
  Labels: QueryParser

 This is a mother issue to track further improvements for eDismax parser.
 The goal is to be able to deprecate and remove the old dismax once edismax 
 satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-11-28 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505554#comment-13505554
 ] 

Leonhard Maylein commented on SOLR-3377:


Ok, I understand.
The phrase boost queries are separated from the normal query expansion via the 
qf paramter.

But, all terms are (equally) qualified by a field (field sw for the terms a and 
b, field ti for the terms c and d).
Why do the eDismax handler only use the terms b and d to build the phrase boost 
query?
Isn't it a bug?



 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0-BETA

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting

2010-07-22 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891018#action_12891018
 ] 

Leonhard Maylein commented on SOLR-1731:


We have the same problem whenever we search for a word which has synonyms 
defined.

 ArrayIndexOutOfBoundsException when highlighting
 

 Key: SOLR-1731
 URL: https://issues.apache.org/jira/browse/SOLR-1731
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Tim Underwood
Priority: Minor

 I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to 
 highlight for certain queries.  The error seems to be an issue with the 
 combination of the ShingleFilterFactory, PositionFilterFactory and the 
 LengthFilterFactory. 
 Here's my fieldType definition:
 fieldType name=textSku class=solr.TextField positionIncrementGap=100 
 omitNorms=true
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LengthFilterFactory min=2 max=100/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.ShingleFilterFactory maxShingleSize=8 
 outputUnigrams=true/
   filter class=solr.PositionFilterFactory /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.LengthFilterFactory min=2 max=100/ !-- works 
 if this is commented out --
 /analyzer
 /fieldType
 Here's the field definition:
 field name=sku_new type=textSku indexed=true stored=true 
 omitNorms=true/
 Here's a sample doc:
 add
 doc
   field name=id1/field
   field name=sku_newA 1280 C/field
 /doc
 /add
 Doing a query for sku_new:A 1280 C and requesting highlighting throws the 
 exception (full stack trace below):  
 http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=*
 If I comment out the LengthFilterFactory from my query analyzer section 
 everything seems to work.  Commenting out just the PositionFilterFactory also 
 makes the exception go away and seems to work for this specific query.
 Full stack trace:
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202)
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
 at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
 at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
 at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
 at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
 at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at