Hi
I am trying to pin-point a mismatch between the offsets produced by
lucene indexing process when I use the offsets to substring from the
original document content.
I try to debug as far as I can go but I lost track of lucene when I am
at line 298 of DefaultIndexingChain (lucene 5.3.0):
for (IndexableField field : docState.doc) {
fieldCount = processField(field, fieldGen, fieldCount);
}
Basically at this point I can see that the content field (one of the
IndexableField) I am interested in has already removed all "\r" from the
"\r\n" newline characters (windows) from the content. But I am unable to
trace how these IndexableField are generated, and how the raw content is
passed to them.
I can be certain that my program did pass strings with lots of "\r\n"
So the question is is this (i.e., removing \r) deliberate?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org