Hi Tim,

> I have a question about the BaseX ft:normalize function. What kind of Unicode 
> normalization is performed by this function, and how might it be implemented 
> using standard XPath functions?

The function is based on a custom BaseX tokenization, which includes
normalization of case, removal of diacritics and (if enabled)
language-based stemming. It would be rather challenging to implement
the behavior with standard XPath (that’s mostly why we introduced
ft:tokenize and ft:normalize). If you are looking for a starting
point, you could begin with the FtTokenize Java class [1].

Hope this helps,
Christian

[1] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/ft/FtTokenize.java#L31-L51

Reply via email to