On Tuesday 29 January 2008 12:11, Tolkin, Steve wrote:
> Suppose I have 2 or more lists that are (conceptually) sublists of the
> same underlying list.
> I want to reconstruct the underlying list.  In other words the order of
> the elements agrees in all the lists, but there is no sort condition.
>
> Example:
> List 1: dog, cat, mouse
> List 2: dog, shark, mouse, elephant
>
> There are 2 possible outputs, and I do not care which one I get.

Sounds like you are trying to solve an alignment problem. It's a common thing 
to do in NLP (especially in speech recognition) and I think also when working 
with gene sequences. If you are interested in surveying what's available it 
might be worth to search for "align" or "alignment" on CPAN, and to look for 
BioPerl stuff.

As an quick suggestion, I have used Algorithm::Diff in the past for similar 
problems. It uses the longest common subsequence algorithm, which provides a 
minimal difference solution to the alignment problem. The sdiff() function 
from that module should get you within a hair's breadth of the solution 
(you'd have only to extract the second or third element of each array ref 
that's returned).

Another typical algorithm for sequence comparison is the minimum edit 
distance, for which there are a couple of modules on CPAN: Text::Levenshtein, 
and Text::WagnerFischer. But those modules only give you the distance, not 
the alignment...

Bernardo
 
_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to