RE: How search code files for words which contains a given substrings?

Gordin, Ira Tue, 26 Jun 2018 03:30:01 -0700

Hello Mikhail,

I see in the link you sent that PositionIncrementAttribute determines the 
position of this token relative to the previous Token in a TokenStream, used in 
phrase searching.
I am not in phrase searching.
Would you mind to explain how it can help me?

Thanks,
Ira

-----Original Message-----
From: Mikhail Khludnev [mailto:[email protected]] 
Sent: Tuesday, June 26, 2018 12:33 PM
To: [email protected]
Subject: Re: How search code files for words which contains a given substrings?

Hello, Ira.
Note the difference between offset
https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/analysis/tokenattributes/OffsetAttribute.html
and
position
https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/analysis/tokenattributes/PositionIncrementAttribute.html
in Lucene terminology.
Please make sure you don't rebuild existing functionality
https://lucene.apache.org/core/7_3_1/highlighter/org/apache/lucene/search/highlight/package-summary.html#package.description

On Tue, Jun 26, 2018 at 10:57 AM Gordin, Ira <[email protected]> wrote:

> Hi all,
> I started to work on project which currently search code files for words
> which contains a given substrings.
> Currently it uses WhitespaceTokenizerand use regex query which wraps the
> searched substring with '.*'.
> For example, if one search for 'a', the query will be '/.*a.*/'. In this
> way in the 'Mama loves banana' text, it will find tokens 'Mama' and
> 'banana'.
> Currently I need to get the start and end positions of matched tokens in
> the line and the line number.
> With TokenStream I can get start and end positions of  'Mama' and 'banana'
> in the full text. But I need the positions of 'a'.
> I see 2 options.
> Option 1: to perform additional search in returned token.
> Option 2: to use NGramTokenizer or NGramTokenFilter (not sure which of
> them) and in this way I hope I will get the 'a' positions in TokenStream.
> Additional question how I can get the line numbers and the positions
> inside the line.
> Many thanks in advance for your help,
> Ira
>
>

-- 
Sincerely yours
Mikhail Khludnev

RE: How search code files for words which contains a given substrings?

Reply via email to