Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-30 Thread Uri Guttman
> "TS" == Tolkin, Steve <[EMAIL PROTECTED]> writes:

  TS> As another aside, if people are interested I can send 77 lines of data
  TS> for each of these 2008 model year cars: Camry, Accord, Infiniti_G35,
  TS> Impreza, Altima, Audi_A4, Volvo_S40, Saab_9_3
  TS> I would not mind off-list opinions on any of these cars.  In general I
  TS> want a car with width <= 70.7 inches (the Accord at 71.7" is probably
  TS> too wide to fit in my garage), and would like AWD.

well, knowing the size of the real world data is an important fact. you
said you didn't want to scan all the date before the sort but as you
have learned, tsort has to do that. 77 lines for each car is a tiny
amount of data which shouldn't have caused you to worry so much about
efficiency. even a bubble sort is cool for 6 elements!

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]    http://www.sysarch.com --
-  Perl Architecture, Development, Training, Support, Code Review  --
---  Search or Offer Perl Jobs  - http://jobs.perl.org  -
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-30 Thread Tolkin, Steve
I am replying to myself to thank all the Perl mongers who replied with
help.
Indeed, my problem is topological sort, as stated by Alex Vandiver and
Gyepi SAM.
I did not see that because the input format is different from that
required by the Unix tsort program.

A search for: tsort "perl power tools"
found this
http://search.cpan.org/src/CWEST/ppt-0.14/html/commands/tsort/index.html
which leads directly to the perl code I used this to solve the problem.
Note the strange name -- tcsort not tsort.  (Perhaps in homage to Tom
Christiansen, the prime mover of the very useful but moribund ppt
project.)

I earlier found tsort.exe (port to Windows) inside
coreutils-5.3.0-bin.zip at
http://sourceforge.net/project/showfiles.php?group_id=23617_id=1
42775 Unfortunately this tsort.exe depends on libintl3.dll which was not
in the *.zip file and which I could not find anywhere.
Aside: Does anyone know where I can get a libintl3.dll ?

Both versions of tsort require the 2 values on each input row be
separated by one space.  Fortunately I was able to transform my data
into this format.

Major kudos to Ben Tilly!  He wrote from scratch a perl program that
solved the problem.   (Since he put in the effort to write this I took
some extra time to test it.  It produced the same output as tsort,
because the lists overlapped enough to overcome the fact that the output
order is in general not deterministic.)

I think the problem statement I gave was clear enough.  Any cycle in the
input is an error.  The tsort program in perl simply reports "cycle
detected" without any information as to which elements are on the cycle.

My use was not related to alignment of DNA.  It was part of a personal
mashup to combine data about cars that I scraped from e.g.
http://autos.yahoo.com/toyota_camry_se_v6-specs/?p=all
The actual values in the list are strings such as these:
Cylinders
Horsepower @ RPM
Fuel Economy Cty/Hwy

As another aside, if people are interested I can send 77 lines of data
for each of these 2008 model year cars: Camry, Accord, Infiniti_G35,
Impreza, Altima, Audi_A4, Volvo_S40, Saab_9_3
I would not mind off-list opinions on any of these cars.  In general I
want a car with width <= 70.7 inches (the Accord at 71.7" is probably
too wide to fit in my garage), and would like AWD.


Thanks,
Steve


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Tolkin, Steve
Sent: Tuesday, January 29, 2008 12:12 PM
To: Boston Perl Mongers
Subject: [Boston.pm] merging lists that are ordered but not sorted

I am looking for a perl program that will solve the following problem.
Suppose I have 2 or more lists that are (conceptually) sublists of the
same underlying list.
I want to reconstruct the underlying list.  In other words the order of
the elements agrees in all the lists, but there is no sort condition.

Example:
List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant

There are 2 possible outputs, and I do not care which one I get.

The reason that I have not just coded this up is that it seems it
require an unbounded amount of look ahead.  Also, when there are more
than 2 lists, I think I need to read from all of them before making a
decision about which element can be safely output.

Thanks,
Steve
-- 
Steven Tolkin[EMAIL PROTECTED] 508-787-9006
Fidelity Investments   400 Puritan Way M3B Marlborough MA 01752
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-30 Thread Ted Zlatanov
On Tue, 29 Jan 2008 16:27:16 -0500 (EST) Chris Devers <[EMAIL PROTECTED]> 
wrote: 

CD> On Tue, 29 Jan 2008, Tolkin, Steve wrote:
>> I want to reconstruct the underlying list.  In other words the order of
>> the elements agrees in all the lists, but there is no sort condition.
...
CD> Out of curiosity, does it have to handle something like this?
...
CD> That is, "outliers", I guess. 
...
CD> That is, "loops", I guess.

CD> Seems like edge cases like that could make this non-determistic.

I think the requirements are too vague, as your list of exceptions
shows.  Steve should either specify the narrow cases for which he cares
(he didn't say the 123/1435 example he posted was the only necessary
one) or should give the general behavior desired for merging of any list
A1..An with another list B1..Bm

It sounds like a fun problem, but as specified we're all only guessing
what it is.

Ted
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Jean-Baptiste Nivoit
http://search.cpan.org/~vparseval/List-MoreUtils-0.22/lib/List/MoreUtils.pm
has primitives that look a little like what you're looking for, but
quite exactly the same. Maybe you could adapt one of those utilities?

On Jan 29, 2008 12:11 PM, Tolkin, Steve <[EMAIL PROTECTED]> wrote:
> I am looking for a perl program that will solve the following problem.
> Suppose I have 2 or more lists that are (conceptually) sublists of the
> same underlying list.
> I want to reconstruct the underlying list.  In other words the order of
> the elements agrees in all the lists, but there is no sort condition.
>
> Example:
> List 1: dog, cat, mouse
> List 2: dog, shark, mouse, elephant
>
> There are 2 possible outputs, and I do not care which one I get.
>
> The reason that I have not just coded this up is that it seems it
> require an unbounded amount of look ahead.  Also, when there are more
> than 2 lists, I think I need to read from all of them before making a
> decision about which element can be safely output.
>
> Thanks,
> Steve
> --
> Steven Tolkin[EMAIL PROTECTED] 508-787-9006
> Fidelity Investments   400 Puritan Way M3B Marlborough MA 01752
> There is nothing so practical as a good theory.  Comments are by me,
> not Fidelity Investments, its subsidiaries or affiliates.
>
> ___
> Boston-pm mailing list
> Boston-pm@mail.pm.org
> http://mail.pm.org/mailman/listinfo/boston-pm
>



-- 
  jb.
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Chris Devers
On Tue, 29 Jan 2008, Tolkin, Steve wrote:

> I want to reconstruct the underlying list.  In other words the order of
> the elements agrees in all the lists, but there is no sort condition.
> 
> Example:
> List 1: dog, cat, mouse
> List 2: dog, shark, mouse, elephant
> 
> There are 2 possible outputs, and I do not care which one I get.

Out of curiosity, does it have to handle something like this?

List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant
List 3: apple, pear, orange

That is, "outliers", I guess. 

Or this?

List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant
List 3: elephant, dog

That is, "loops", I guess.

Seems like edge cases like that could make this non-determistic.


-- 
Chris Devers
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Gyepi SAM
On Tue, Jan 29, 2008 at 12:11:56PM -0500, Tolkin, Steve wrote:
> I am looking for a perl program that will solve the following problem.
> Suppose I have 2 or more lists that are (conceptually) sublists of the
> same underlying list.
> I want to reconstruct the underlying list.  In other words the order of
> the elements agrees in all the lists, but there is no sort condition.
> 
> Example:
> List 1: dog, cat, mouse
> List 2: dog, shark, mouse, elephant
> 
> There are 2 possible outputs, and I do not care which one I get.


Sounds like a problem begging for tsort.
'man tsort' or 'info coreutils tsort'.
Should be on most unixoid systems.

You'd need to pre-process your data into pairs of strings
and feed that to tsort, which will produce one of the possible
totally ordered outputs.

$ tsort 

Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Bernardo Rechea


On Tuesday 29 January 2008 12:11, Tolkin, Steve wrote:
> Suppose I have 2 or more lists that are (conceptually) sublists of the
> same underlying list.
> I want to reconstruct the underlying list.  In other words the order of
> the elements agrees in all the lists, but there is no sort condition.
>
> Example:
> List 1: dog, cat, mouse
> List 2: dog, shark, mouse, elephant
>
> There are 2 possible outputs, and I do not care which one I get.

Sounds like you are trying to solve an alignment problem. It's a common thing 
to do in NLP (especially in speech recognition) and I think also when working 
with gene sequences. If you are interested in surveying what's available it 
might be worth to search for "align" or "alignment" on CPAN, and to look for 
BioPerl stuff.

As an quick suggestion, I have used Algorithm::Diff in the past for similar 
problems. It uses the longest common subsequence algorithm, which provides a 
minimal difference solution to the alignment problem. The sdiff() function 
from that module should get you within a hair's breadth of the solution 
(you'd have only to extract the second or third element of each array ref 
that's returned).

Another typical algorithm for sequence comparison is the minimum edit 
distance, for which there are a couple of modules on CPAN: Text::Levenshtein, 
and Text::WagnerFischer. But those modules only give you the distance, not 
the alignment...

Bernardo
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Alex Vandiver

On Tue, 2008-01-29 at 12:04 -0800, Ben Tilly wrote:
> [snip]
> That is logically unavoidable.

Yup, and in fact provably true.  The problem is the same as finding a
topological ordering among a directed acyclic graph.  Wikipedia can
inform you more on the topic
( http://en.wikipedia.org/wiki/Topological_sort ) but a better reference
is the canonical "Introduction to Algorithms" by Cormen, Leiserson,
Rivest, and Stein.
 - Alex

-- 
Networking -- only one letter away from not working

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Ben Tilly
On Jan 29, 2008 10:57 AM, David Golden <[EMAIL PROTECTED]> wrote:
> On Jan 29, 2008 12:11 PM, Tolkin, Steve <[EMAIL PROTECTED]> wrote:
> > I want to reconstruct the underlying list.  In other words the order of
> > the elements agrees in all the lists, but there is no sort condition.
> >
> > Example:
> > List 1: dog, cat, mouse
> > List 2: dog, shark, mouse, elephant
> >
> > There are 2 possible outputs, and I do not care which one I get.
> >
> > The reason that I have not just coded this up is that it seems it
> > require an unbounded amount of look ahead.  Also, when there are more
> > than 2 lists, I think I need to read from all of them before making a
> > decision about which element can be safely output.

Yes, this is true.  You need an unbounded look ahead.  Because you
might have 5 lists, and the last item of list 1 might be the first
item of list 2, the last of list 2 might be the first of last 3, etc.
Until you've read each list in full you can't know how to solve it.

> What comes to mind is indexing all of the words on the maximum depth
> they occur in any list.  Then you output all the elements of max_depth
> 1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3
> (mouse), max_depth 4 (elephant).

List 1: dog, cat, mouse, rat, chimp
List 2: chimp, gorilla, kangaroo

The algorithm you described would output gorilla before rat.

> You still have to read all the lists at least once, though.

That is logically unavoidable.

Here is some lightly tested code to do what is asked.  It has, though,
no error checking.

sub merge_lists {
  my @list_infos
= map { {
list => $_,
depth => {},
position => 0,
done => 0,
  } } @_;

  # Create indexes.
  for my $list_info (@list_infos) {
my $list = $list_info->{list};
my $depth = $list_info->{depth};

for my $i (0..$#$list) {
  $depth->{$list->[$i]} = $i;
}
  }

  my @merged_list;
  FIND_ELEMENT: while (@list_infos) {
LIST: for my $list_info (@list_infos) {
  my $element = $list_info->{list}->[ $list_info->{position} ];

  # Is this element one that can go next?
  for my $other_list_info (@list_infos) {
my $other_position = $other_list_info->{position};
my $other_depth = $other_list_info->{depth}->{$element};
if ($other_depth and $other_depth > $other_list_info->{position}) {
  # Oops, we'll have to take an element from another list.
  next LIST;
}
  }

  # We've found our element, let's do bookkeeping
  push @merged_list, $element;
  for my $other_list_info (@list_infos) {
my $other_list = $other_list_info->{list};
my $other_position = $other_list_info->{position};

if ( $element eq $other_list->[$other_position] ) {
  if ($other_position == $#$other_list) {
$other_list_info->{done} = 1;
  }
  else {
$other_list_info->{position}++;
  }
}
  }

  # And cleanup.
  @list_infos = grep {not $_->{done}} @list_infos;
  next FIND_ELEMENT;
}
  }

  return @merged_list;
}

And to run it just call:

merge_list([qw(dog cat mouse)], [qw(dog shark mouse elephant)];

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread David Golden
On Jan 29, 2008 12:11 PM, Tolkin, Steve <[EMAIL PROTECTED]> wrote:
> I want to reconstruct the underlying list.  In other words the order of
> the elements agrees in all the lists, but there is no sort condition.
>
> Example:
> List 1: dog, cat, mouse
> List 2: dog, shark, mouse, elephant
>
> There are 2 possible outputs, and I do not care which one I get.
>
> The reason that I have not just coded this up is that it seems it
> require an unbounded amount of look ahead.  Also, when there are more
> than 2 lists, I think I need to read from all of them before making a
> decision about which element can be safely output.

What comes to mind is indexing all of the words on the maximum depth
they occur in any list.  Then you output all the elements of max_depth
1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3
(mouse), max_depth 4 (elephant).

You still have to read all the lists at least once, though.

David
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm