Hi Rob,

LCS can still be useful for bioinformatics/genetics. So I'd say that's worth 
including. In Java, if I ever needed it, I would probably look for it at 
Biojava (which I just did and couldn't easily find it there).


As for the other string distances, I always look at this GitHub project:

https://github.com/tdebatty/java-string-similarity

And also Talend (I think Data Quality has some string distances). However, I 
think having the API design, and some string distances implemented could be 
enough for a 1.0. Then we can add more, and release more
versions.


Cheers
Bruno



----- Original Message -----
> From: Rob Tompkins <chtom...@gmail.com>
> To: Commons Developers List <dev@commons.apache.org>
> Sent: Monday, 19 December 2016 3:47 PM
> Subject: [text][TEXT-32] Regarding more edit distances.
> 
> Hello,
> 
> With the thought that we want more "edit distances”/“similarity scores” in 
> the codebase for the potential 1.0 release of TEXT, I’ve opened an associated 
> Jira (TEXT-32). I was wondering if any folks had any input about further 
> ideas.
> 
> The first idea that I stumbled upon was an edit distance based upon the 
> longest 
> common substring. It feels a tad coarse, but that doesn’t necessarily mean 
> that 
> it’s not worth including.
> 
> Other thoughts and ideas?
> 
> Cheers,
> -Rob
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to