Re: [PR] JAMES-4046 Upgrade Lucene [james-project]

via GitHub Mon, 12 Aug 2024 16:17:50 -0700


gautamworah96 commented on PR #2342:
URL: https://github.com/apache/james-project/pull/2342#issuecomment-2285055678

Hi @uschindler

I didn't understand a few details

> Lucene does not know how the document was indexed originally, the
information is not stored anywhere in index. When you then reindex the document
it will aply default settings for all the fields in the Document (which is
analyzed/tokenized with the default analyzer). This will transform all
StringField instances to TextField.

Let's take the code
[here](https://github.com/tigase/james-project/commit/e5fe4010131f085754cfadcffdb14224612bb848)
as a reference for the purpose of this discussion. In the code, on
L1293#[LuceneMessageSearchIndex.java](https://github.com/tigase/james-project/commit/e5fe4010131f085754cfadcffdb14224612bb848#diff-a7c2a3c5cdb7e4a2914c899409991e27df6b25ad54488f197bc533193e3a03d0),
when they tried using a TermQuery on the ID field, that too failed. That
atleast should've worked right? Sure Lucene does not store indexing
information, but it would've stored the untokenized ID right (without info on
whether or not it was tokenized)? All the TermQuery had to do was match against
the terms indexed in the ID field?

Even the updates made on L1282 used the same StringField

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] JAMES-4046 Upgrade Lucene [james-project]

Reply via email to