Re: DelimitedTermFrequencyTokenFilter

Alan Woodward Fri, 29 Nov 2019 02:13:21 -0800

I think it’s working fine - Luke is showing you the docFreq of the term, which 
will be 1 as it only appears in a single document.


> On 28 Nov 2019, at 21:51, Edward Ribeiro <edward.ribe...@gmail.com 
> <mailto:edward.ribe...@gmail.com>> wrote:
> 
> Hi,
> 
> Please, anyone has an example of DelimitedTermFrequencyTokenFilter use that 
> could share? 
> 
> I have been banging my head against the wall trying to make it work ( 
> https://gist.github.com/eribeiro/ebb24feb3fd84931b7c288b9b716ed49 
> <https://gist.github.com/eribeiro/ebb24feb3fd84931b7c288b9b716ed49> ) and idk 
> what I am doing wrong. 
> 
> I am creating a custom analyzer that uses a WhitespaceTokenizer to parse a 
> string like "a|10 b|2 c|9", and pass it to DelimitedTermFrequencyTokenFilter. 
> I am inserting a custom field that is added to the document to prevent it 
> from having positions and offsets.
> 
> The debugger shows the string is being correctly parsed by DTFTF and its char 
> and term attributes are properly set up. But the term frequency of each term 
> is 1 when I inspect the index via Luke. Curiously, the output of my snippet 
> shows the correct total term frequency as seen below:
> 
> field="text",maxDoc=1,docCount=1,sumTotalTermFreq=123,sumDocFreq=3
> a|10 b|23 c|90
> SumTotalTermFreq: 123
> SumDocFreq: 3
> 
> Cheers,
> Edward
> PS: I am a Lucene newbie so it may be something quite stupid. 
>

Re: DelimitedTermFrequencyTokenFilter

Reply via email to