[Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Tolkin, Steve
I am looking for a perl program that will solve the following problem.
Suppose I have 2 or more lists that are (conceptually) sublists of the
same underlying list.
I want to reconstruct the underlying list.  In other words the order of
the elements agrees in all the lists, but there is no sort condition.

Example:
List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant

There are 2 possible outputs, and I do not care which one I get.

The reason that I have not just coded this up is that it seems it
require an unbounded amount of look ahead.  Also, when there are more
than 2 lists, I think I need to read from all of them before making a
decision about which element can be safely output.

Thanks,
Steve
-- 
Steven Tolkin[EMAIL PROTECTED] 508-787-9006
Fidelity Investments   400 Puritan Way M3B Marlborough MA 01752
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread David Golden
On Jan 29, 2008 12:11 PM, Tolkin, Steve [EMAIL PROTECTED] wrote:
 I want to reconstruct the underlying list.  In other words the order of
 the elements agrees in all the lists, but there is no sort condition.

 Example:
 List 1: dog, cat, mouse
 List 2: dog, shark, mouse, elephant

 There are 2 possible outputs, and I do not care which one I get.

 The reason that I have not just coded this up is that it seems it
 require an unbounded amount of look ahead.  Also, when there are more
 than 2 lists, I think I need to read from all of them before making a
 decision about which element can be safely output.

What comes to mind is indexing all of the words on the maximum depth
they occur in any list.  Then you output all the elements of max_depth
1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3
(mouse), max_depth 4 (elephant).

You still have to read all the lists at least once, though.

David
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Bernardo Rechea


On Tuesday 29 January 2008 12:11, Tolkin, Steve wrote:
 Suppose I have 2 or more lists that are (conceptually) sublists of the
 same underlying list.
 I want to reconstruct the underlying list.  In other words the order of
 the elements agrees in all the lists, but there is no sort condition.

 Example:
 List 1: dog, cat, mouse
 List 2: dog, shark, mouse, elephant

 There are 2 possible outputs, and I do not care which one I get.

Sounds like you are trying to solve an alignment problem. It's a common thing 
to do in NLP (especially in speech recognition) and I think also when working 
with gene sequences. If you are interested in surveying what's available it 
might be worth to search for align or alignment on CPAN, and to look for 
BioPerl stuff.

As an quick suggestion, I have used Algorithm::Diff in the past for similar 
problems. It uses the longest common subsequence algorithm, which provides a 
minimal difference solution to the alignment problem. The sdiff() function 
from that module should get you within a hair's breadth of the solution 
(you'd have only to extract the second or third element of each array ref 
that's returned).

Another typical algorithm for sequence comparison is the minimum edit 
distance, for which there are a couple of modules on CPAN: Text::Levenshtein, 
and Text::WagnerFischer. But those modules only give you the distance, not 
the alignment...

Bernardo
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Chris Devers
On Tue, 29 Jan 2008, Tolkin, Steve wrote:

 I want to reconstruct the underlying list.  In other words the order of
 the elements agrees in all the lists, but there is no sort condition.
 
 Example:
 List 1: dog, cat, mouse
 List 2: dog, shark, mouse, elephant
 
 There are 2 possible outputs, and I do not care which one I get.

Out of curiosity, does it have to handle something like this?

List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant
List 3: apple, pear, orange

That is, outliers, I guess. 

Or this?

List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant
List 3: elephant, dog

That is, loops, I guess.

Seems like edge cases like that could make this non-determistic.


-- 
Chris Devers
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Gyepi SAM
On Tue, Jan 29, 2008 at 12:11:56PM -0500, Tolkin, Steve wrote:
 I am looking for a perl program that will solve the following problem.
 Suppose I have 2 or more lists that are (conceptually) sublists of the
 same underlying list.
 I want to reconstruct the underlying list.  In other words the order of
 the elements agrees in all the lists, but there is no sort condition.
 
 Example:
 List 1: dog, cat, mouse
 List 2: dog, shark, mouse, elephant
 
 There are 2 possible outputs, and I do not care which one I get.


Sounds like a problem begging for tsort.
'man tsort' or 'info coreutils tsort'.
Should be on most unixoid systems.

You'd need to pre-process your data into pairs of strings
and feed that to tsort, which will produce one of the possible
totally ordered outputs.

$ tsort EOS
dog cat
cat mouse
dog shark
shark mouse
mouse elephant
EOS

-Gyepi
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Ben Tilly
On Jan 29, 2008 10:57 AM, David Golden [EMAIL PROTECTED] wrote:
 On Jan 29, 2008 12:11 PM, Tolkin, Steve [EMAIL PROTECTED] wrote:
  I want to reconstruct the underlying list.  In other words the order of
  the elements agrees in all the lists, but there is no sort condition.
 
  Example:
  List 1: dog, cat, mouse
  List 2: dog, shark, mouse, elephant
 
  There are 2 possible outputs, and I do not care which one I get.
 
  The reason that I have not just coded this up is that it seems it
  require an unbounded amount of look ahead.  Also, when there are more
  than 2 lists, I think I need to read from all of them before making a
  decision about which element can be safely output.

Yes, this is true.  You need an unbounded look ahead.  Because you
might have 5 lists, and the last item of list 1 might be the first
item of list 2, the last of list 2 might be the first of last 3, etc.
Until you've read each list in full you can't know how to solve it.

 What comes to mind is indexing all of the words on the maximum depth
 they occur in any list.  Then you output all the elements of max_depth
 1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3
 (mouse), max_depth 4 (elephant).

List 1: dog, cat, mouse, rat, chimp
List 2: chimp, gorilla, kangaroo

The algorithm you described would output gorilla before rat.

 You still have to read all the lists at least once, though.

That is logically unavoidable.

Here is some lightly tested code to do what is asked.  It has, though,
no error checking.

sub merge_lists {
  my @list_infos
= map { {
list = $_,
depth = {},
position = 0,
done = 0,
  } } @_;

  # Create indexes.
  for my $list_info (@list_infos) {
my $list = $list_info-{list};
my $depth = $list_info-{depth};

for my $i (0..$#$list) {
  $depth-{$list-[$i]} = $i;
}
  }

  my @merged_list;
  FIND_ELEMENT: while (@list_infos) {
LIST: for my $list_info (@list_infos) {
  my $element = $list_info-{list}-[ $list_info-{position} ];

  # Is this element one that can go next?
  for my $other_list_info (@list_infos) {
my $other_position = $other_list_info-{position};
my $other_depth = $other_list_info-{depth}-{$element};
if ($other_depth and $other_depth  $other_list_info-{position}) {
  # Oops, we'll have to take an element from another list.
  next LIST;
}
  }

  # We've found our element, let's do bookkeeping
  push @merged_list, $element;
  for my $other_list_info (@list_infos) {
my $other_list = $other_list_info-{list};
my $other_position = $other_list_info-{position};

if ( $element eq $other_list-[$other_position] ) {
  if ($other_position == $#$other_list) {
$other_list_info-{done} = 1;
  }
  else {
$other_list_info-{position}++;
  }
}
  }

  # And cleanup.
  @list_infos = grep {not $_-{done}} @list_infos;
  next FIND_ELEMENT;
}
  }

  return @merged_list;
}

And to run it just call:

merge_list([qw(dog cat mouse)], [qw(dog shark mouse elephant)];

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Alex Vandiver

On Tue, 2008-01-29 at 12:04 -0800, Ben Tilly wrote:
 [snip]
 That is logically unavoidable.

Yup, and in fact provably true.  The problem is the same as finding a
topological ordering among a directed acyclic graph.  Wikipedia can
inform you more on the topic
( http://en.wikipedia.org/wiki/Topological_sort ) but a better reference
is the canonical Introduction to Algorithms by Cormen, Leiserson,
Rivest, and Stein.
 - Alex

-- 
Networking -- only one letter away from not working

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm