--- Begin Message ---Thank You! -cam On Tue, Jul 28, 2015 at 11:39 AM, Cameron Sanders via Pharo-users < pharo-users@lists.pharo.org> wrote:> > > ---------- Forwarded message ---------- > From: Cameron Sanders <camsand...@aol.com> > To: Any question about pharo is welcome <pharo-users@lists.pharo.org> > Cc: > Date: Tue, 28 Jul 2015 11:00:11 -0400 > Subject: Re: [Pharo-users] New methods for the String class > What fuzzy-string matching tools & packages are available today? > > -cam > > On Wed, Feb 26, 2014 at 9:09 AM, Hernán Morales Durand < > hernan.mora...@gmail.com> wrote: > >> >> >> >> 2014-02-26 7:10 GMT-03:00 Norbert Hartl <norb...@hartl.name>: >> >>> >>> Am 26.02.2014 um 09:50 schrieb Pharo4Stef <pharo4s...@free.fr>: >>> >>> >>> We can have an information retrieval API for aproximate string matching, >>> i.e. Levenshtein distance (already implemented, various versions), Hamming >>> distance, both are the most used and simplest edit distances. >>> Then you have Longest common subsequence, Longest common substring (they >>> are implemented in a package called "Fuzz", #longestCommonSubsequenceWith: >>> ). Also there is the shift-or adapted for approximate matches (also >>> implemented), fuzzy phrasing is another world also. Many applications use >>> Damerau edit distance. Bioinformatics uses the Needleman-Wunsch and >>> Smith-Waterman, but they call them "aligners" :) but you don't want to code >>> the optimized version in Smalltalk, some say it could take years. >>> All edit distances out there have specific requirements and no one is >>> better than another for all cases. For example Jaro-Winkler is useful for >>> one-word short strings. >>> >>> >>> I’m not sure that all these edit distances should be part of the String >>> core api. >>> Now what would be good is to have a chapter describing them. This >>> chapter would work well with the bioSmalltalk one :) >>> >>> I’m pretty sure they shouldn’t. Most of these are most likely for >>> special applications. So a perfect candidate for a string extension >>> package. A real modular entity that could load each of them individually >>> would be perfect but we don’t have the proper tools, yet. Unless of course >>> every of those algorithms is composed of multiple classes and would fit >>> naturally in a package. >>> >> >> Absolutely for a separate package for information retrieval algorithms. >> From what I've seen, some algorithms require optimization through dynamic >> programming (automatas, matrices, etc) and that would lead to multiple >> classes, assuming you don't want to get dirty String class. >> >> >>> But the most important prerequisite would be to make a separate package >>> out of it. Did I understand that right that those are part of biosmalltalk? >>> >> >> No. Those algorithms are spread over different packages in repositories >> like SqueakSource, Cincom Store, etc. >> >> Hernán >> >> >> > >
--- End Message ---
Re: [Pharo-users] New methods for the String class
Cameron Sanders via Pharo-users Tue, 28 Jul 2015 19:45:28 -0700
- Re: [Pharo-users] New methods for the Stri... Cameron Sanders via Pharo-users
- Re: [Pharo-users] New methods for the... stepharo
- Re: [Pharo-users] New methods for the... Cameron Sanders via Pharo-users