Re: [PR] JAMES-4046 Upgrade Lucene [james-project]

via GitHub Sat, 10 Aug 2024 15:20:36 -0700


woj-tek commented on PR #2342:
URL: https://github.com/apache/james-project/pull/2342#issuecomment-2282298701


   Some more information - it looks like for some reason ID field 
(`flags-<uid>-<mid>`) is tokenized in the end and because it follows the spec 
it splits the value at dashes:
   
   > Lucene's StandardTokenizer for 9.11 uses the Unicode Text Segmentation
   > algorithm, as specified in Unicode Standard Annex #29
   > <http://unicode.org/reports/tr29/>;.
   > That standard contains a "-" as a word breaker.
   
   I was looking at the `createFlagsDocument()` (it add ID of the mailbox and 
of the messages as separate field) and at `update()` method, which uses UID+MID 
in the search query to find the document, but then uses Term (which requires 
string hence falling back to the ID_FIELD with `flags-<uid>-<mid>`) but maybe 
instead using `updateDocument()` method requiring Term we could use one that 
uses query (which we could re-use from the initial search for UID+MID 
combination):
   
   ```
   writer.updateDocuments(queryBuilder.build(), List.of(doc));
   ```
   
   From the quick test it seems it works OK.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] JAMES-4046 Upgrade Lucene [james-project]

Reply via email to