This is an automated email from the ASF dual-hosted git repository.
paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 3218c3d flesh out examples
3218c3d is described below
commit 3218c3dc87f9a7b5dfa68ae44e1cca564095aff1
Author: Paul King <[email protected]>
AuthorDate: Sat Feb 8 21:43:27 2025 +1000
flesh out examples
---
site/src/site/blog/groovy-text-similarity.adoc | 279 +++++++++++++++++++++++++
1 file changed, 279 insertions(+)
diff --git a/site/src/site/blog/groovy-text-similarity.adoc
b/site/src/site/blog/groovy-text-similarity.adoc
index b256214..bdcd103 100644
--- a/site/src/site/blog/groovy-text-similarity.adoc
+++ b/site/src/site/blog/groovy-text-similarity.adoc
@@ -893,6 +893,285 @@ green cat ██████▏ cat ███▏ hi
feline █████▏ bare ███▏ bear ▏
cow █▏ bear ███▏
----
+== Playing the game
+
+=== Round 1
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 1): aftershock
+LongestCommonSubsequence 0
+Levenshtein Distance: 10, Insert: 0, Delete: 3, Substitute:
7
+Jaccard 0%
+JaroWinkler PREFIX 0% / SUFFIX 0%
+Phonetic Metaphone=AFTRXK 47% / Soundex=A136 0%
+Meaning Angle 45% / Use 21% / ConceptNet 2% / Glove -4%
/ FastText 19%
+
+Possible letters: b d g i j l m n p q u v w x y z
+Guess the hidden word (turn 2): fruit
+LongestCommonSubsequence 2
+Levenshtein Distance: 6, Insert: 2, Delete: 0, Substitute: 4
+Jaccard 22%
+JaroWinkler PREFIX 56% / SUFFIX 45%
+Phonetic Metaphone=FRT 39% / Soundex=F630 0%
+Meaning Angle 64% / Use 41% / ConceptNet 37% / Glove
31% / FastText 44%
+
+Possible letters: b d g i j l m n p q u v w x y z
+Guess the hidden word (turn 3): buzzing
+LongestCommonSubsequence 4
+Levenshtein Distance: 3, Insert: 0, Delete: 0, Substitute: 3
+Jaccard 50%
+JaroWinkler PREFIX 71% / SUFFIX 80%
+Phonetic Metaphone=BSNK 58% / Soundex=B252 50%
+Meaning Angle 44% / Use 19% / ConceptNet -9% / Glove
-2% / FastText 24%
+
+Possible letters: b d g i j l m n p q u v w x y z
+Guess the hidden word (turn 4): pulling
+LongestCommonSubsequence 5
+Levenshtein Distance: 2, Insert: 0, Delete: 0, Substitute: 2
+Jaccard 71%
+JaroWinkler PREFIX 85% / SUFFIX 87%
+Phonetic Metaphone=PLNK 80% / Soundex=P452 75%
+Meaning Angle 48% / Use 25% / ConceptNet -8% / Glove 3%
/ FastText 29%
+
+Possible letters: b d g i j l m n p q u v w x y z
+Guess the hidden word (turn 5): pudding
+LongestCommonSubsequence 7
+Levenshtein Distance: 0, Insert: 0, Delete: 0, Substitute: 0
+Jaccard 100%
+JaroWinkler PREFIX 100% / SUFFIX 100%
+Phonetic Metaphone=PTNK 100% / Soundex=P352 100%
+Meaning Angle 100% / Use 100% / ConceptNet 100% / Glove
100% / FastText 100%
+
+Congratulations, you guessed correctly!
+----
+
+=== Round 2
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 1): bail
+LongestCommonSubsequence 1
+Levenshtein Distance: 7, Insert: 4, Delete: 0, Substitute: 3
+Jaccard 22% (2/9) 2 / 9
+JaroWinkler PREFIX 42% / SUFFIX 46%
+Phonetic Metaphone=BL 38% / Soundex=B400 25%
+Meaning Angle 46% / Use 40% / ConceptNet 0% / Glove 0%
/ FastText 31%
+
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 2): leg
+LongestCommonSubsequence 2
+Levenshtein Distance: 6, Insert: 5, Delete: 0, Substitute: 1
+Jaccard 25% (2/8) 1 / 4
+JaroWinkler PREFIX 47% / SUFFIX 0%
+Phonetic Metaphone=LK 38% / Soundex=L200 0%
+Meaning Angle 50% / Use 18% / ConceptNet 11% / Glove
13% / FastText 37%
+
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 3): languish
+LongestCommonSubsequence 2
+Levenshtein Distance: 8, Insert: 0, Delete: 0, Substitute: 8
+Jaccard 15% (2/13) 2 / 13
+JaroWinkler PREFIX 50% / SUFFIX 50%
+Phonetic Metaphone=LNKX 34% / Soundex=L522 0%
+Meaning Angle 46% / Use 12% / ConceptNet -11% / Glove
-4% / FastText 25%
+
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 4): election
+LongestCommonSubsequence 5
+Levenshtein Distance: 4, Insert: 0, Delete: 0, Substitute: 4
+Jaccard 40% (4/10) 2 / 5
+JaroWinkler PREFIX 83% / SUFFIX 75%
+Phonetic Metaphone=ELKXN 50% / Soundex=E423 75%
+Meaning Angle 47% / Use 13% / ConceptNet -5% / Glove
-7% / FastText 26%
+
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 5): elevator
+LongestCommonSubsequence 8
+Levenshtein Distance: 0, Insert: 0, Delete: 0, Substitute: 0
+Jaccard 100% (7/7) 1
+JaroWinkler PREFIX 100% / SUFFIX 100%
+Phonetic Metaphone=ELFTR 100% / Soundex=E413 100%
+Meaning Angle 100% / Use 100% / ConceptNet 100% / Glove
100% / FastText 100%
+
+Congratulations, you guessed correctly!
+----
+
+=== Round 3
+
+Let's take a first guess with a 10-letter (all distinct) word.
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 1): aftershock
+LongestCommonSubsequence 3
+Levenshtein Distance: 8, Insert: 1, Delete: 3, Substitute: 4
+Jaccard 33%
+JaroWinkler PREFIX 56% / SUFFIX 56%
+Phonetic Metaphone=AFTRXK 32% / Soundex=A136 25%
+Meaning Angle 41% / Use 20% / ConceptNet -4% / Glove
-13% / FastText 11%
+----
+
+Tells us:
+
+* We did two more deletes than inserts, so
+[fuchsia]#the hidden word has 8 characters#.
+* If the hidden word is size 8, why would we ever do inserts, i.e. make it
longer? Doing the insert (and subsequent deletes) must have made it possible to
get 3 letters into the correct position.
+* Soundex tells use that it either starts with A and the other consonant
+groupings are wrong, or it doesn't start with A and one consonant grouping is
correct. Metaphone of 32% means we probably have two consonant groupings
correct.
+* Our guess has 10 distinct letters. Jaccard of 33% tells
+that we have 4/12 or 5/15 letters correct. If we have 5 letters correct
+there would be up to 3 letters we don't have, but adding 3 to the 10 in our
guess
+doesn't give 15. So we have 4 of 12 letters. There must be up to 4 letters we
don't have. Add those 4 to our 10 gives 14, but we know there is only 12
distinct letters, so the answer has two duplicates or a triple.
+I.e. [fuchsia]#the answer has 6 distinct letters#.
+
+The letters `e` and `s` are very common. Let's pick a word with
+2 of each that matches what we know from LCS.
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 1): aftershock
+LongestCommonSubsequence 3
+Levenshtein Distance: 8, Insert: 1, Delete: 3, Substitute: 4
+Jaccard 33% (4/12) 1 / 3
+JaroWinkler PREFIX 56% / SUFFIX 56%
+Phonetic Metaphone=AFTRXK 32% / Soundex=A136 25%
+Meaning Angle 41% / Use 20% / ConceptNet -4% / Glove
-13% / FastText 11%
+
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 2): patriate
+LongestCommonSubsequence 2
+Levenshtein Distance: 7, Insert: 0, Delete: 0, Substitute: 7
+Jaccard 20% (2/10) 1 / 5
+JaroWinkler PREFIX 47% / SUFFIX 47%
+Phonetic Metaphone=PTRT 38% / Soundex=P363 0%
+Meaning Angle 39% / Use 23% / ConceptNet 13% / Glove 0%
/ FastText 27%
+
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 3): tarragon
+LongestCommonSubsequence 3
+Levenshtein Distance: 5, Insert: 0, Delete: 0, Substitute: 5
+Jaccard 71% (5/7) 5 / 7
+JaroWinkler PREFIX 68% / SUFFIX 68%
+Phonetic Metaphone=TRKN 50% / Soundex=T625 25%
+Meaning Angle 46% / Use 4% / ConceptNet -7% / Glove 5%
/ FastText 26%
+
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 4): kangaroo
+LongestCommonSubsequence 8
+Levenshtein Distance: 0, Insert: 0, Delete: 0, Substitute: 0
+Jaccard 100% (6/6) 1
+JaroWinkler PREFIX 100% / SUFFIX 100%
+Phonetic Metaphone=KNKR 100% / Soundex=K526 100%
+Meaning Angle 100% / Use 100% / ConceptNet 100% / Glove
100% / FastText 100%
+
+Congratulations, you guessed correctly!
+----
+
+* Our Jaccard is now 1/11. That must be the 6 letters we tried plus
+5 others in the hidden word, so our correct letter isn't one of the duplicates.
+I.e. [fuchsia]#there is no S or E in the word#.
+* Our soundex indicates the word doesn't start with S which confirms our
previous derived fact.
+* Our metaphone has dropped markedly. We know the S shouldn't be there
+but with only 10%, only one of F or R is probably correct, and we
+probably need a K or T from turn 1.
+
+Let's try duplicates for `o` and `r`, and also match LCS from previous guesses.
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 3): motorcar
+LongestCommonSubsequence 2
+Levenshtein Distance: 8, Insert: 0, Delete: 0, Substitute: 8
+Jaccard 33% (3/9) 1 / 3
+JaroWinkler PREFIX 47% / SUFFIX 47%
+Phonetic Metaphone=MTRKR 43% / Soundex=M362 0%
+Meaning Angle 44% / Use 20% / ConceptNet -4% / Glove 6%
/ FastText 33%
+----
+
+* Soundex indicates that the word doesn't start with M
+* Our Jaccard is now 3/9. That must mean .
+
+=== Round 4
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 1): aftershock
+LongestCommonSubsequence 3
+Levenshtein Distance: 8, Insert: 0, Delete: 4, Substitute: 4
+Jaccard 50%
+JaroWinkler PREFIX 61% / SUFFIX 49%
+Phonetic Metaphone=AFTRXK 33% / Soundex=A136 25%
+Meaning Angle 44% / Use 11% / ConceptNet -7% / Glove 1%
/ FastText 15%
+----
+
+What do we know?
+
+* we deleted 4 letters, so [fuchsia]#the hidden word has 6 letters#
+* Jaccard of 50% is either 5/10 or 6/12. If the latter, we'd have all the
letters, so there can't be 2 additional letters in the hidden word, so it's
5/10. That means we need to pick 5 letter
+from aftershock, duplicate one of them, and we'll have all the letters
+* phonetic clues suggest it probably doesn't start with A
+
+In aftershock, F, H, and K, are probably least common. Let's pick a 6-letter
word from
+the remaining 7 letters that abides by our LCS clue. We know this can't be
right because
+we aren't duplicating a letter yet, but we just want to narrow down the
possibilities.
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 2): coarse
+LongestCommonSubsequence 3
+Levenshtein Distance: 4, Insert: 0, Delete: 0, Substitute: 4
+Jaccard 57% (4/7) 4 / 7
+JaroWinkler PREFIX 67% / SUFFIX 67%
+Phonetic Metaphone=KRS 74% / Soundex=C620 75%
+Meaning Angle 51% / Use 12% / ConceptNet 5% / Glove 23%
/ FastText 26%
+----
+
+This tells us:
+
+* We now have 4 of the 5 distinct letters (we should discard 2)
+* Phonetics indicates we are close but not very close yet,
+from the Metaphone value of KRS we should drop one and keep two.
+
+Let's assume C and E are wrong and bring in the other common letter, T.
+We need to find a word that matches the LCS conditions from previous guesses,
+and we'll duplicate one letter, S.
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 3): roasts
+LongestCommonSubsequence 3
+Levenshtein Distance: 6, Insert: 0, Delete: 0, Substitute: 6
+Jaccard 67% (4/6) 2 / 3
+JaroWinkler PREFIX 56% / SUFFIX 56%
+Phonetic Metaphone=RSTS 61% / Soundex=R232 25%
+Meaning Angle 54% / Use 25% / ConceptNet 18% / Glove
18% / FastText 31%
+----
+
+We learned:
+
+* Phonetics dropped, so maybe S wasn't the correct letter to bring in,
+we want the K (from letter C) and R from the previous guess.
+* Also, the semantic meaning has bumped up to warm (from cold for previous
guesses).
+Maybe the hidden word is related to roasts.
+
+Let's try to word starting with C, related to roasts.
+
+----
+Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
+Guess the hidden word (turn 4): carrot
+LongestCommonSubsequence 6
+Levenshtein Distance: 0, Insert: 0, Delete: 0, Substitute: 0
+Jaccard 100% (5/5) 1
+JaroWinkler PREFIX 100% / SUFFIX 100%
+Phonetic Metaphone=KRT 100% / Soundex=C630 100%
+Meaning Angle 100% / Use 100% / ConceptNet 100% / Glove
100% / FastText 100%
+
+Congratulations, you guessed correctly!
+----
+
+Success!
+
== Further information [[further_info]]
Source code for this post: