[EMAIL PROTECTED] wrote:
> Hello
>
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine
> patterns, i.e. families of strings...).
>
> Thanks for any idea
>
I'm not sure your application, but Genomicists and Proteomicists have
found that Hidden Markov Models can be very powerful for developing
pattern models. Perhaps have a look at "Biological Sequence Analysis" by
Durbin et al.
Also, a very cool regex based algorithm was developed at IBM:
http://cbcsrv.watson.ibm.com/Tspd.html
But I think HMMs are the way to go. Check out HMMER at WUSTL by Sean
Eddy and colleagues:
http://hmmer.janelia.org/
http://selab.janelia.org/people/eddys/
James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
http://www.jamesstroud.com/
--
http://mail.python.org/mailman/listinfo/python-list