Hello, we have used the Apriori-Algorithm to detect long identical text passages (https://link.springer.com/chapter/10.1007/978-3-030-86159-9_34). That works quite well. I am not sure whether Frieda Jsi published the code, but it is quite easy to implement or I can send you the code.
Best Christian _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-le...@list.elra.info