lucene deliberately removes \r (windows carriage char)

Ziqi Zhang Sat, 03 Oct 2015 08:44:04 -0700

Hi

I am trying to pin-point a mismatch between the offsets produced bylucene indexing process when I use the offsets to substring from theoriginal document content.

I try to debug as far as I can go but I lost track of lucene when I amat line 298 of DefaultIndexingChain (lucene 5.3.0):


for (IndexableField field : docState.doc) {
        fieldCount = processField(field, fieldGen, fieldCount);
      }

Basically at this point I can see that the content field (one of theIndexableField) I am interested in has already removed all "\r" from the"\r\n" newline characters (windows) from the content. But I am unable totrace how these IndexableField are generated, and how the raw content ispassed to them.


I can be certain that my program did pass strings with lots of "\r\n"

So the question is is this (i.e., removing \r) deliberate?

Thanks



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

lucene deliberately removes \r (windows carriage char)

Reply via email to