Re: [Pharo-users] New methods for the String class

Pharo4Stef Wed, 26 Feb 2014 00:51:25 -0800

> 
> We can have an information retrieval API for aproximate string matching, i.e. 
> Levenshtein distance (already implemented, various versions), Hamming 
> distance, both are the most used and simplest edit distances.
> Then you have Longest common subsequence, Longest common substring (they are 
> implemented in a package called "Fuzz", #longestCommonSubsequenceWith: ). 
> Also there is the shift-or adapted for approximate matches (also 
> implemented), fuzzy phrasing is another world also. Many applications use 
> Damerau edit distance. Bioinformatics uses the Needleman-Wunsch and 
> Smith-Waterman, but they call them "aligners" :) but you don't want to code 
> the optimized version in Smalltalk, some say it could take years.
> All edit distances out there have specific requirements and no one is better 
> than another for all cases. For example Jaro-Winkler is useful for one-word 
> short strings.
>


I’m not sure that all these edit distances should be part of the String core 
api.
Now what would be good is to have a chapter describing them. This chapter would 
work well with the bioSmalltalk one :)


> You have a lot of options for research. Smalltalkers here are very 
> experienced and clever, always gives cool advices so don't be afraid to ask.
> 
> Cheers,
> 
> Hernán
> 
>  
> -- 
> Cheers,
> Daniela Meneses
>

Re: [Pharo-users] New methods for the String class

Reply via email to