On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies <ben...@basistech.com> wrote: > In Japanese, compounds are just decompositions of the input string. In > other languages, compounds can manufacture entire tokens from thin > air. In those cases, it's something of a question how to decide on the > offsets. I think that you're right, eventually, insofar as there's > some offset in the original that might as well be blamed for any given > component. >
Why change the offsets then? Offsets are for highlighting. Let the whole compound be highlighted when its a match in search results. Its transparent and totally accurate as to what is happening: this is why we do highlighting, to aid the user can make a relevance assessment about the document, not to try to assist the end user to debug the analysis chain or anything like that. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org