[Boston.pm] merging lists that are ordered but not sorted
I am looking for a perl program that will solve the following problem. Suppose I have 2 or more lists that are (conceptually) sublists of the same underlying list. I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. The reason that I have not just coded this up is that it seems it require an unbounded amount of look ahead. Also, when there are more than 2 lists, I think I need to read from all of them before making a decision about which element can be safely output. Thanks, Steve -- Steven Tolkin[EMAIL PROTECTED] 508-787-9006 Fidelity Investments 400 Puritan Way M3B Marlborough MA 01752 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Jan 29, 2008 12:11 PM, Tolkin, Steve [EMAIL PROTECTED] wrote: I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. The reason that I have not just coded this up is that it seems it require an unbounded amount of look ahead. Also, when there are more than 2 lists, I think I need to read from all of them before making a decision about which element can be safely output. What comes to mind is indexing all of the words on the maximum depth they occur in any list. Then you output all the elements of max_depth 1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3 (mouse), max_depth 4 (elephant). You still have to read all the lists at least once, though. David ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tuesday 29 January 2008 12:11, Tolkin, Steve wrote: Suppose I have 2 or more lists that are (conceptually) sublists of the same underlying list. I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. Sounds like you are trying to solve an alignment problem. It's a common thing to do in NLP (especially in speech recognition) and I think also when working with gene sequences. If you are interested in surveying what's available it might be worth to search for align or alignment on CPAN, and to look for BioPerl stuff. As an quick suggestion, I have used Algorithm::Diff in the past for similar problems. It uses the longest common subsequence algorithm, which provides a minimal difference solution to the alignment problem. The sdiff() function from that module should get you within a hair's breadth of the solution (you'd have only to extract the second or third element of each array ref that's returned). Another typical algorithm for sequence comparison is the minimum edit distance, for which there are a couple of modules on CPAN: Text::Levenshtein, and Text::WagnerFischer. But those modules only give you the distance, not the alignment... Bernardo ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tue, 29 Jan 2008, Tolkin, Steve wrote: I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. Out of curiosity, does it have to handle something like this? List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant List 3: apple, pear, orange That is, outliers, I guess. Or this? List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant List 3: elephant, dog That is, loops, I guess. Seems like edge cases like that could make this non-determistic. -- Chris Devers ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tue, Jan 29, 2008 at 12:11:56PM -0500, Tolkin, Steve wrote: I am looking for a perl program that will solve the following problem. Suppose I have 2 or more lists that are (conceptually) sublists of the same underlying list. I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. Sounds like a problem begging for tsort. 'man tsort' or 'info coreutils tsort'. Should be on most unixoid systems. You'd need to pre-process your data into pairs of strings and feed that to tsort, which will produce one of the possible totally ordered outputs. $ tsort EOS dog cat cat mouse dog shark shark mouse mouse elephant EOS -Gyepi ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Jan 29, 2008 10:57 AM, David Golden [EMAIL PROTECTED] wrote: On Jan 29, 2008 12:11 PM, Tolkin, Steve [EMAIL PROTECTED] wrote: I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. The reason that I have not just coded this up is that it seems it require an unbounded amount of look ahead. Also, when there are more than 2 lists, I think I need to read from all of them before making a decision about which element can be safely output. Yes, this is true. You need an unbounded look ahead. Because you might have 5 lists, and the last item of list 1 might be the first item of list 2, the last of list 2 might be the first of last 3, etc. Until you've read each list in full you can't know how to solve it. What comes to mind is indexing all of the words on the maximum depth they occur in any list. Then you output all the elements of max_depth 1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3 (mouse), max_depth 4 (elephant). List 1: dog, cat, mouse, rat, chimp List 2: chimp, gorilla, kangaroo The algorithm you described would output gorilla before rat. You still have to read all the lists at least once, though. That is logically unavoidable. Here is some lightly tested code to do what is asked. It has, though, no error checking. sub merge_lists { my @list_infos = map { { list = $_, depth = {}, position = 0, done = 0, } } @_; # Create indexes. for my $list_info (@list_infos) { my $list = $list_info-{list}; my $depth = $list_info-{depth}; for my $i (0..$#$list) { $depth-{$list-[$i]} = $i; } } my @merged_list; FIND_ELEMENT: while (@list_infos) { LIST: for my $list_info (@list_infos) { my $element = $list_info-{list}-[ $list_info-{position} ]; # Is this element one that can go next? for my $other_list_info (@list_infos) { my $other_position = $other_list_info-{position}; my $other_depth = $other_list_info-{depth}-{$element}; if ($other_depth and $other_depth $other_list_info-{position}) { # Oops, we'll have to take an element from another list. next LIST; } } # We've found our element, let's do bookkeeping push @merged_list, $element; for my $other_list_info (@list_infos) { my $other_list = $other_list_info-{list}; my $other_position = $other_list_info-{position}; if ( $element eq $other_list-[$other_position] ) { if ($other_position == $#$other_list) { $other_list_info-{done} = 1; } else { $other_list_info-{position}++; } } } # And cleanup. @list_infos = grep {not $_-{done}} @list_infos; next FIND_ELEMENT; } } return @merged_list; } And to run it just call: merge_list([qw(dog cat mouse)], [qw(dog shark mouse elephant)]; Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tue, 2008-01-29 at 12:04 -0800, Ben Tilly wrote: [snip] That is logically unavoidable. Yup, and in fact provably true. The problem is the same as finding a topological ordering among a directed acyclic graph. Wikipedia can inform you more on the topic ( http://en.wikipedia.org/wiki/Topological_sort ) but a better reference is the canonical Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein. - Alex -- Networking -- only one letter away from not working ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm