(groovy-website) branch asf-site updated: add Jaccard description

paulk Sat, 01 Feb 2025 04:15:52 -0800

This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 925b866  add Jaccard description
925b866 is described below

commit 925b866cf6c323b30c3d2fe0bb117f8aff4bed23
Author: Paul King <[email protected]>
AuthorDate: Sat Feb 1 22:15:36 2025 +1000

    add Jaccard description
---
 site/src/site/blog/groovy-text-similarity.adoc | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/site/src/site/blog/groovy-text-similarity.adoc 
b/site/src/site/blog/groovy-text-similarity.adoc
index fefac4d..be88b46 100644
--- a/site/src/site/blog/groovy-text-similarity.adoc
+++ b/site/src/site/blog/groovy-text-similarity.adoc
@@ -249,6 +249,7 @@ Rather than show the results of all algorithms for all 
pairs, let's just show a
 that give us insight into which similarity measures might be most useful for 
our game.
 
 A first observation is the usefulness of Jaccard with k=1 (looking at the set 
of individual letters).
+Here we can imagine that `bear` might be our guess and `bare` might be the 
hidden word.
 
 ++++
 <pre>
@@ -270,6 +271,10 @@ Jaccard (debatty k=1)           0.00 <span 
style="color:green">▏</span>
 
 We can rule out all letters from our guess!
 
+What about Jaccard looking at multi-letter sequences? Well, if you were trying 
to determine
+whether a social media account `@elton_john` might be the same person as the 
email `[email protected]`,
+Jaccard with higher indexes would help you out.
+
 ++++
 <pre>
       elton john VS john elton
@@ -284,6 +289,10 @@ Jaccard (debatty k=3)           0.00 <span 
style="color:red">▏</span>
 </pre>
 ++++
 
+Note that for "Elton John" backwards, Jaccard with higher values of k quickly 
drops to zero but just swapping
+the words (like our social media account and email with punctuation removed) 
remains high. So higher value
+values of k for Jaccard definitely have there place but perhaps not needed for 
our game.
+
 ++++
 <pre>
       bear VS bean

(groovy-website) branch asf-site updated: add Jaccard description

Reply via email to