RE: charFilter

Osullivan L . Thu, 13 Sep 2012 03:45:14 -0700

Hi Folks,

I'm getting the following error after using a custom filter:


SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token PR  
2823.000000 A0.200000 S0.819880 exceeds length of provided text sized 15

As the error suggests, the input value is PR2823.A2S81988 (15 chars). I have 
been informed that correctOffset() method of the CharFilter class can be used 
to resolve this issue but as far as I can tell, all that does is return the 
value - it doesn't set it. 

I have included some details below.

Kind Regards,

Luke

In my schema I have:

    <fieldType name="LCNormalized" class="solr.TextField" 
sortMissingLast="true" omitNorms="true">
        <analyzer>
          <charFilter 
class="com.test.solr.analysis.LukesTestCharFilterFactory"/>
          <tokenizer class="solr.KeywordTokenizerFactory"/>
        </analyzer>
    </fieldType>

and the method is:

public class LukesTestCharFilterFactory extends BaseCharFilterFactory {

        public CharStream create(CharStream input) {
                return new LukesTestCharFilter(input);
        }
}

public final class LukesTestCharFilter extends BaseCharFilter
{
 ...
  public LukesTestCharFilter(CharStream input)  {
          super(input);
          try {
          // Load the whole input into a string
          StringBuilder sb = new StringBuilder();
          char[] buf = new char[1024];

          int len;
          while ((len = input.read(buf)) >= 0) {
              sb.append(buf, 0, len);
          }

          String original = sb.toString();
          String modified = getLCShelfkey(original);
          CharStream result = CharReader.get(new StringReader(modified));

          this.input = result;
          this.input.correctOffset(modified.length());
      } catch (IOException e) {
          System.err.println("There was a problem parsing input.  Skipping.");
      }
  }
 ...
}

RE: charFilter

Reply via email to