> 
> We can have an information retrieval API for aproximate string matching, i.e. 
> Levenshtein distance (already implemented, various versions), Hamming 
> distance, both are the most used and simplest edit distances.
> Then you have Longest common subsequence, Longest common substring (they are 
> implemented in a package called "Fuzz", #longestCommonSubsequenceWith: ). 
> Also there is the shift-or adapted for approximate matches (also 
> implemented), fuzzy phrasing is another world also. Many applications use 
> Damerau edit distance. Bioinformatics uses the Needleman-Wunsch and 
> Smith-Waterman, but they call them "aligners" :) but you don't want to code 
> the optimized version in Smalltalk, some say it could take years.
> All edit distances out there have specific requirements and no one is better 
> than another for all cases. For example Jaro-Winkler is useful for one-word 
> short strings.
> 

I’m not sure that all these edit distances should be part of the String core 
api.
Now what would be good is to have a chapter describing them. This chapter would 
work well with the bioSmalltalk one :)


> You have a lot of options for research. Smalltalkers here are very 
> experienced and clever, always gives cool advices so don't be afraid to ask.
> 
> Cheers,
> 
> Hernán
> 
>  
> -- 
> Cheers,
> Daniela Meneses
> 

Reply via email to