[GitHub] [lucenenet] rclabo edited a comment on issue #569: Int64Field tokenized

GitBox Fri, 10 Dec 2021 08:48:49 -0800


rclabo edited a comment on issue #569:
URL: https://github.com/apache/lucenenet/issues/569#issuecomment-991120781



   Emre – It’s a really good question.  I’ve wondered the same thing before as 
well.  Your question prompted me to do a bit of digging and this is the 
conclusion I reached:
   
    It seems that Lucene considers the step of converting an Int64Field into a 
Trie structure for indexing to be a form of tokenization.  While the approach 
does not use an Analyzer per se it is true that Lucene does greatly change the 
form of the number before putting that new representation into the index.  And 
non-tokenized fields are placed directly in the inverted index, which is not 
the case for numbers since what is placed in the inverted index is a trie 
structure corresponding to the number.  That trie structure often has 8 terms 
which are placed in the inverted index but the number of terms will very based 
on the numeric Field’s NumericPrecisionStep.
   
   One piece of code that shines a bit of light onto this is 
https://github.com/apache/lucenenet/blob/Lucene.Net_4_8_0_beta00015/src/Lucene.Net/Document/Field.cs#L168
 )
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [lucenenet] rclabo edited a comment on issue #569: Int64Field tokenized

Reply via email to