This is probably difficult since in BaseX, fuzzy matching is implemented using the Levenshtein distance between two strings [1]. Therefore similarity is a relation between pairs of paragraphs rather than an intrinsic property of an individual paragraph.

You should look for content fingerprinting/clustering techniques.

[1] https://docs.basex.org/wiki/Full-Text#Fuzzy_Querying


On 12.11.2020 00:00, Graydon Saunders wrote:
Hello --

Is there some way to assign the abstraction of a fuzzy match to a variable, so that something like

for $x in //p
   let $key := get-fuzzy-match-value($x)
   group by $key
   return <similar-paragraphs>{$x}</similar-paragraphs>

would be possible?

I'm supposing this is one of those things that's either easy or impossible.

Thanks!
Graydon

Reply via email to