woj-tek commented on PR #2342: URL: https://github.com/apache/james-project/pull/2342#issuecomment-2282298701
Some more information - it looks like for some reason ID field (`flags-<uid>-<mid>`) is tokenized in the end and because it follows the spec it splits the value at dashes: > Lucene's StandardTokenizer for 9.11 uses the Unicode Text Segmentation > algorithm, as specified in Unicode Standard Annex #29 > <http://unicode.org/reports/tr29/>;. > That standard contains a "-" as a word breaker. I was looking at the `createFlagsDocument()` (it add ID of the mailbox and of the messages as separate field) and at `update()` method, which uses UID+MID in the search query to find the document, but then uses Term (which requires string hence falling back to the ID_FIELD with `flags-<uid>-<mid>`) but maybe instead using `updateDocument()` method requiring Term we could use one that uses query (which we could re-use from the initial search for UID+MID combination): ``` writer.updateDocuments(queryBuilder.build(), List.of(doc)); ``` From the quick test it seems it works OK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
