This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 9cc19f7  start fleshing out other sections
9cc19f7 is described below

commit 9cc19f729784d6ca37cb004c37353bda67b0a526
Author: Paul King <[email protected]>
AuthorDate: Tue Jan 28 20:58:51 2025 +1000

    start fleshing out other sections
---
 site/src/site/blog/groovy-text-similarity.adoc | 55 +++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/site/src/site/blog/groovy-text-similarity.adoc 
b/site/src/site/blog/groovy-text-similarity.adoc
index 1b9924e..2433293 100644
--- a/site/src/site/blog/groovy-text-similarity.adoc
+++ b/site/src/site/blog/groovy-text-similarity.adoc
@@ -34,14 +34,38 @@ Handling cases explicitly like this soon becomes tedious.
 We'll look at some libraries which can help us handle comparisons
 in more general ways.
 
-== Simple comparisons
-
 First, we'll examine three libraries for performing similarity matching:
 
 * info.debatty:java-string-similarity
 * org.apache.commons:commons-text Apache Commons Text
 * commons-codec:commons-codec Apache Commons Codec for Soundex
 
+Then we'll look at some deep learning options.
+
+== Simple String Metrics
+
+https://en.wikipedia.org/wiki/Levenshtein_distance[Levenshtein],
+https://en.wikipedia.org/wiki/Jaccard_index[Jaccard],
+https://en.wikipedia.org/wiki/Hamming_distance[Hamming],
+https://en.wikipedia.org/wiki/Longest_common_subsequence[LongestCommonSubsequence],
+https://en.wikipedia.org/wiki/Jaro_distance[JaroWinkler].
+
+++++
+<pre>
+      there VS their
+JaroWinklerSimilarity                     0.91 <span 
style="color:green">██████████████████▏</span>
+JaroWinkler                               0.91 <span 
style="color:green">██████████████████▏</span>
+Jaccard (debatty k=1)                     0.80 <span 
style="color:green">████████████████▏</span>
+RatcliffObershelp                         0.80 <span 
style="color:green">████████████████▏</span>
+JaccardSimilarity (commons text k=1)      0.80 <span 
style="color:green">████████████████▏</span>
+NormalizedLevenshtein                     0.60 <span 
style="color:red">████████████▏</span>
+Cosine                                    0.33 <span 
style="color:red">██████▏</span>
+Jaccard (debatty k=2)                     0.33 <span 
style="color:red">██████▏</span>
+SorensenDice                              0.33 <span 
style="color:red">██████▏</span>
+Jaccard (debatty k=3)                     0.20 <span 
style="color:red">████▏</span>
+</pre>
+++++
+
 == Phonetic Algorithms
 
 https://en.wikipedia.org/wiki/Phonetic_algorithm[Phonetic algorithms] map 
words into representations of their pronunciation. They are often used for 
spell checkers, searching, data deduplication and speech to text systems.
@@ -139,6 +163,31 @@ hippo|hippopotamus  50%            40%            40%
 </pre>
 ++++
 
+== Deep Learning
+
+----
+    Cows eat grass
+Bovines convert grass to milk (0.80)
+Bulls consume hay (0.69)
+Bulls trample grass (0.68)
+Dogs play in the grass (0.65)
+The grass is green (0.62)
+
+    Poodles are cute
+Dachshunds are delightful (0.63)
+Dogs play in the grass (0.56)
+The grass is green (0.44)
+Bovines convert grass to milk (0.40)
+One two three (0.38)
+
+    The water is turquoise
+The sea is blue (0.72)
+The sky is blue (0.65)
+The grass is green (0.53)
+One two three (0.43)
+Dogs play in the grass (0.35)
+----
+
 == Further information
 
 Source code for this post:
@@ -150,3 +199,5 @@ Other referenced sites:
 * https://commons.apache.org/proper/commons-text/
 * https://commons.apache.org/proper/commons-codec/
 * https://github.com/tdebatty/java-string-similarity
+* https://github.com/OpenRefine/OpenRefine
+* https://djl.ai/

Reply via email to