rclabo edited a comment on issue #569:
URL: https://github.com/apache/lucenenet/issues/569#issuecomment-991120781
Emre – It’s a really good question. I’ve wondered the same thing before as
well. Your question prompted me to do a bit of digging and this is the
conclusion I reached:
It seems that Lucene considers the step of converting an Int64Field into a
Trie structure for indexing to be a form of tokenization. While the approach
does not use an Analyzer per se it is true that Lucene does greatly change the
form of the number before putting that new representation into the index. And
non-tokenized fields are placed directly in the inverted index, which is not
the case for numbers since what is placed in the inverted index is a trie
structure corresponding to the number. That trie structure often has 8 terms
which are placed in the inverted index but the number of terms will very based
on the numeric Field’s NumericPrecisionStep.
One piece of code that shines a bit of light onto this is
https://github.com/apache/lucenenet/blob/Lucene.Net_4_8_0_beta00015/src/Lucene.Net/Document/Field.cs#L168
)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]