DelimitedTermFrequencyTokenFilter

Edward Ribeiro Thu, 28 Nov 2019 13:52:45 -0800

Hi,

Please, anyone has an example of DelimitedTermFrequencyTokenFilter use that
could share?


I have been banging my head against the wall trying to make it work (
https://gist.github.com/eribeiro/ebb24feb3fd84931b7c288b9b716ed49 ) and idk
what I am doing wrong.

I am creating a custom analyzer that uses a WhitespaceTokenizer to parse a
string like "a|10 b|2 c|9", and pass it to
DelimitedTermFrequencyTokenFilter. I am inserting a custom field that is
added to the document to prevent it from having positions and offsets.

The debugger shows the string is being correctly parsed by DTFTF and its
char and term attributes are properly set up. But the term frequency of
each term is 1 when I inspect the index via Luke. Curiously, the output of
my snippet shows the correct total term frequency as seen below:

field="text",maxDoc=1,docCount=1,sumTotalTermFreq=123,sumDocFreq=3
a|10 b|23 c|90
SumTotalTermFreq: 123
SumDocFreq: 3

Cheers,
Edward
PS: I am a Lucene newbie so it may be something quite stupid.

DelimitedTermFrequencyTokenFilter

Reply via email to