Hi Folks,
I'm getting the following error after using a custom filter:
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token PR
2823.000000 A0.200000 S0.819880 exceeds length of provided text sized 15
As the error suggests, the input value is PR2823.A2S81988 (15 chars). I have
been informed that correctOffset() method of the CharFilter class can be used
to resolve this issue but as far as I can tell, all that does is return the
value - it doesn't set it.
I have included some details below.
Kind Regards,
Luke
In my schema I have:
<fieldType name="LCNormalized" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<charFilter
class="com.test.solr.analysis.LukesTestCharFilterFactory"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
and the method is:
public class LukesTestCharFilterFactory extends BaseCharFilterFactory {
public CharStream create(CharStream input) {
return new LukesTestCharFilter(input);
}
}
public final class LukesTestCharFilter extends BaseCharFilter
{
...
public LukesTestCharFilter(CharStream input) {
super(input);
try {
// Load the whole input into a string
StringBuilder sb = new StringBuilder();
char[] buf = new char[1024];
int len;
while ((len = input.read(buf)) >= 0) {
sb.append(buf, 0, len);
}
String original = sb.toString();
String modified = getLCShelfkey(original);
CharStream result = CharReader.get(new StringReader(modified));
this.input = result;
this.input.correctOffset(modified.length());
} catch (IOException e) {
System.err.println("There was a problem parsing input. Skipping.");
}
}
...
}