Hi

I am trying to pin-point a mismatch between the offsets produced by lucene indexing process when I use the offsets to substring from the original document content.

I try to debug as far as I can go but I lost track of lucene when I am at line 298 of DefaultIndexingChain (lucene 5.3.0):

for (IndexableField field : docState.doc) {
        fieldCount = processField(field, fieldGen, fieldCount);
      }

Basically at this point I can see that the content field (one of the IndexableField) I am interested in has already removed all "\r" from the "\r\n" newline characters (windows) from the content. But I am unable to trace how these IndexableField are generated, and how the raw content is passed to them.

I can be certain that my program did pass strings with lots of "\r\n"

So the question is is this (i.e., removing \r) deliberate?

Thanks



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to