Hello Mikhail, I see in the link you sent that PositionIncrementAttribute determines the position of this token relative to the previous Token in a TokenStream, used in phrase searching. I am not in phrase searching. Would you mind to explain how it can help me?
Thanks, Ira -----Original Message----- From: Mikhail Khludnev [mailto:m...@apache.org] Sent: Tuesday, June 26, 2018 12:33 PM To: java-user@lucene.apache.org Subject: Re: How search code files for words which contains a given substrings? Hello, Ira. Note the difference between offset https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/analysis/tokenattributes/OffsetAttribute.html and position https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/analysis/tokenattributes/PositionIncrementAttribute.html in Lucene terminology. Please make sure you don't rebuild existing functionality https://lucene.apache.org/core/7_3_1/highlighter/org/apache/lucene/search/highlight/package-summary.html#package.description On Tue, Jun 26, 2018 at 10:57 AM Gordin, Ira <ira.gor...@sap.com> wrote: > Hi all, > I started to work on project which currently search code files for words > which contains a given substrings. > Currently it uses WhitespaceTokenizerand use regex query which wraps the > searched substring with '.*'. > For example, if one search for 'a', the query will be '/.*a.*/'. In this > way in the 'Mama loves banana' text, it will find tokens 'Mama' and > 'banana'. > Currently I need to get the start and end positions of matched tokens in > the line and the line number. > With TokenStream I can get start and end positions of 'Mama' and 'banana' > in the full text. But I need the positions of 'a'. > I see 2 options. > Option 1: to perform additional search in returned token. > Option 2: to use NGramTokenizer or NGramTokenFilter (not sure which of > them) and in this way I hope I will get the 'a' positions in TokenStream. > Additional question how I can get the line numbers and the positions > inside the line. > Many thanks in advance for your help, > Ira > > -- Sincerely yours Mikhail Khludnev