Hi all,
I started to work on project which currently search code files for words which 
contains a given substrings.
Currently it uses WhitespaceTokenizerand use regex query which wraps the 
searched substring with '.*'.
For example, if one search for 'a', the query will be '/.*a.*/'. In this way in 
the 'Mama loves banana' text, it will find tokens 'Mama' and 'banana'.
Currently I need to get the start and end positions of matched tokens in the 
line and the line number.
With TokenStream I can get start and end positions of  'Mama' and 'banana' in 
the full text. But I need the positions of 'a'.
I see 2 options.
Option 1: to perform additional search in returned token.
Option 2: to use NGramTokenizer or NGramTokenFilter (not sure which of them) 
and in this way I hope I will get the 'a' positions in TokenStream.
Additional question how I can get the line numbers and the positions inside the 
line.
Many thanks in advance for your help,
Ira

Reply via email to