Hi all,
I started to work on project which currently search code files for words which
contains a given substrings.
Currently it uses WhitespaceTokenizerand use regex query which wraps the
searched substring with '.*'.
For example, if one search for 'a', the query will be '/.*a.*/'. In this way in
the 'Mama loves banana' text, it will find tokens 'Mama' and 'banana'.
Currently I need to get the start and end positions of matched tokens in the
line and the line number.
With TokenStream I can get start and end positions of 'Mama' and 'banana' in
the full text. But I need the positions of 'a'.
I see 2 options.
Option 1: to perform additional search in returned token.
Option 2: to use NGramTokenizer or NGramTokenFilter (not sure which of them)
and in this way I hope I will get the 'a' positions in TokenStream.
Additional question how I can get the line numbers and the positions inside the
line.
Many thanks in advance for your help,
Ira