Re: [LANG] Algorithm for fuzzy string matching
Yes, that would be the plan, I guess :-) 2014-05-02 15:58 GMT+02:00 Gary Gregory : > So, keep SU as a kitchen sink and refactor for 4.0? I'm OK with that. > > Gary > > > On Fri, May 2, 2014 at 7:03 AM, Benedikt Ritter > wrote: > > > Hi Gary, > > > > we had a discussion about this some time ago, where I proposed to create > a > > new class (let's call it StringMetrics) and move Levenshtein and Jaro > > Winkler to it. We decided not to do this in 3.x, since SU already has > 180+ > > methods which will have to be split up in the next major release. > > > > Benedikt > > > > > > 2014-05-02 13:00 GMT+02:00 Gary Gregory : > > > > > Do we really want this in SU or should it live in its own class? > > > > > > Gary > > > > > > Original message From: Benedikt > Ritter < > > > brit...@apache.org> Date:05/02/2014 04:15 (GMT-05:00) > > > To: Commons Developers List > > > Subject: Re: [LANG] Algorithm for fuzzy string matching > > > > > > Since nobody had objections against adding this, I'll apply this > > > patch. > > > > > > Benedikt > > > > > > > > > 2014-04-28 17:47 GMT+02:00 Benedikt Ritter : > > > > > > > Hi all, > > > > > > > > we have a nice PR for StringUtils at github: > > > > https://github.com/apache/commons-lang/pull/20 > > > > > > > > It adds a new string matching algorithm to StringUtils, that > > calculates a > > > > score for the similarity between to strings. This kind of fuzzy > > matching > > > is > > > > known from editors like Sublime Text, Text Mate or Atom. > > > > > > > > I think this is a very useful features, but as the contributor points > > > out, > > > > the is no scientific paper or thesis that provides a reference for > the > > > > implementation. So this is not _the one_ implementation of a fuzzy > > string > > > > matching score, like our implementations of the Levenshtein or > > > Jaro-Winkler > > > > algorithms. > > > > > > > > So before adding this, I'd like to hear how others feel about this > > > feature. > > > > > > > > Regards, > > > > Benedikt > > > > > > > > > > > > -- > > > > http://people.apache.org/~britter/ > > > > http://www.systemoutprintln.de/ > > > > http://twitter.com/BenediktRitter > > > > http://github.com/britter > > > > > > > > > > > > > > > > -- > > > http://people.apache.org/~britter/ > > > http://www.systemoutprintln.de/ > > > http://twitter.com/BenediktRitter > > > http://github.com/britter > > > > > > > > > > > -- > > http://people.apache.org/~britter/ > > http://www.systemoutprintln.de/ > > http://twitter.com/BenediktRitter > > http://github.com/britter > > > > > > -- > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > Java Persistence with Hibernate, Second Edition< > http://www.manning.com/bauer3/> > JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > Spring Batch in Action <http://www.manning.com/templier/> > Blog: http://garygregory.wordpress.com > Home: http://garygregory.com/ > Tweet! http://twitter.com/GaryGregory > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter
Re: [LANG] Algorithm for fuzzy string matching
So, keep SU as a kitchen sink and refactor for 4.0? I'm OK with that. Gary On Fri, May 2, 2014 at 7:03 AM, Benedikt Ritter wrote: > Hi Gary, > > we had a discussion about this some time ago, where I proposed to create a > new class (let's call it StringMetrics) and move Levenshtein and Jaro > Winkler to it. We decided not to do this in 3.x, since SU already has 180+ > methods which will have to be split up in the next major release. > > Benedikt > > > 2014-05-02 13:00 GMT+02:00 Gary Gregory : > > > Do we really want this in SU or should it live in its own class? > > > > Gary > > > > Original message From: Benedikt Ritter < > > brit...@apache.org> Date:05/02/2014 04:15 (GMT-05:00) > > To: Commons Developers List > > Subject: Re: [LANG] Algorithm for fuzzy string matching > > > > Since nobody had objections against adding this, I'll apply this > > patch. > > > > Benedikt > > > > > > 2014-04-28 17:47 GMT+02:00 Benedikt Ritter : > > > > > Hi all, > > > > > > we have a nice PR for StringUtils at github: > > > https://github.com/apache/commons-lang/pull/20 > > > > > > It adds a new string matching algorithm to StringUtils, that > calculates a > > > score for the similarity between to strings. This kind of fuzzy > matching > > is > > > known from editors like Sublime Text, Text Mate or Atom. > > > > > > I think this is a very useful features, but as the contributor points > > out, > > > the is no scientific paper or thesis that provides a reference for the > > > implementation. So this is not _the one_ implementation of a fuzzy > string > > > matching score, like our implementations of the Levenshtein or > > Jaro-Winkler > > > algorithms. > > > > > > So before adding this, I'd like to hear how others feel about this > > feature. > > > > > > Regards, > > > Benedikt > > > > > > > > > -- > > > http://people.apache.org/~britter/ > > > http://www.systemoutprintln.de/ > > > http://twitter.com/BenediktRitter > > > http://github.com/britter > > > > > > > > > > > -- > > http://people.apache.org/~britter/ > > http://www.systemoutprintln.de/ > > http://twitter.com/BenediktRitter > > http://github.com/britter > > > > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition<http://www.manning.com/bauer3/> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> Spring Batch in Action <http://www.manning.com/templier/> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory
Re: [LANG] Algorithm for fuzzy string matching
Hi Gary, we had a discussion about this some time ago, where I proposed to create a new class (let's call it StringMetrics) and move Levenshtein and Jaro Winkler to it. We decided not to do this in 3.x, since SU already has 180+ methods which will have to be split up in the next major release. Benedikt 2014-05-02 13:00 GMT+02:00 Gary Gregory : > Do we really want this in SU or should it live in its own class? > > Gary > > Original message From: Benedikt Ritter < > brit...@apache.org> Date:05/02/2014 04:15 (GMT-05:00) > To: Commons Developers List > Subject: Re: [LANG] Algorithm for fuzzy string matching > > Since nobody had objections against adding this, I'll apply this > patch. > > Benedikt > > > 2014-04-28 17:47 GMT+02:00 Benedikt Ritter : > > > Hi all, > > > > we have a nice PR for StringUtils at github: > > https://github.com/apache/commons-lang/pull/20 > > > > It adds a new string matching algorithm to StringUtils, that calculates a > > score for the similarity between to strings. This kind of fuzzy matching > is > > known from editors like Sublime Text, Text Mate or Atom. > > > > I think this is a very useful features, but as the contributor points > out, > > the is no scientific paper or thesis that provides a reference for the > > implementation. So this is not _the one_ implementation of a fuzzy string > > matching score, like our implementations of the Levenshtein or > Jaro-Winkler > > algorithms. > > > > So before adding this, I'd like to hear how others feel about this > feature. > > > > Regards, > > Benedikt > > > > > > -- > > http://people.apache.org/~britter/ > > http://www.systemoutprintln.de/ > > http://twitter.com/BenediktRitter > > http://github.com/britter > > > > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter
Re: [LANG] Algorithm for fuzzy string matching
Do we really want this in SU or should it live in its own class? Gary Original message From: Benedikt Ritter Date:05/02/2014 04:15 (GMT-05:00) To: Commons Developers List Subject: Re: [LANG] Algorithm for fuzzy string matching Since nobody had objections against adding this, I'll apply this patch. Benedikt 2014-04-28 17:47 GMT+02:00 Benedikt Ritter : > Hi all, > > we have a nice PR for StringUtils at github: > https://github.com/apache/commons-lang/pull/20 > > It adds a new string matching algorithm to StringUtils, that calculates a > score for the similarity between to strings. This kind of fuzzy matching is > known from editors like Sublime Text, Text Mate or Atom. > > I think this is a very useful features, but as the contributor points out, > the is no scientific paper or thesis that provides a reference for the > implementation. So this is not _the one_ implementation of a fuzzy string > matching score, like our implementations of the Levenshtein or Jaro-Winkler > algorithms. > > So before adding this, I'd like to hear how others feel about this feature. > > Regards, > Benedikt > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter
Re: [LANG] Algorithm for fuzzy string matching
Since nobody had objections against adding this, I'll apply this patch. Benedikt 2014-04-28 17:47 GMT+02:00 Benedikt Ritter : > Hi all, > > we have a nice PR for StringUtils at github: > https://github.com/apache/commons-lang/pull/20 > > It adds a new string matching algorithm to StringUtils, that calculates a > score for the similarity between to strings. This kind of fuzzy matching is > known from editors like Sublime Text, Text Mate or Atom. > > I think this is a very useful features, but as the contributor points out, > the is no scientific paper or thesis that provides a reference for the > implementation. So this is not _the one_ implementation of a fuzzy string > matching score, like our implementations of the Levenshtein or Jaro-Winkler > algorithms. > > So before adding this, I'd like to hear how others feel about this feature. > > Regards, > Benedikt > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter