Re: [Boston.pm] merging lists that are ordered but not sorted
> "TS" == Tolkin, Steve <[EMAIL PROTECTED]> writes: TS> As another aside, if people are interested I can send 77 lines of data TS> for each of these 2008 model year cars: Camry, Accord, Infiniti_G35, TS> Impreza, Altima, Audi_A4, Volvo_S40, Saab_9_3 TS> I would not mind off-list opinions on any of these cars. In general I TS> want a car with width <= 70.7 inches (the Accord at 71.7" is probably TS> too wide to fit in my garage), and would like AWD. well, knowing the size of the real world data is an important fact. you said you didn't want to scan all the date before the sort but as you have learned, tsort has to do that. 77 lines for each car is a tiny amount of data which shouldn't have caused you to worry so much about efficiency. even a bubble sort is cool for 6 elements! uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.sysarch.com -- - Perl Architecture, Development, Training, Support, Code Review -- --- Search or Offer Perl Jobs - http://jobs.perl.org - - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
I am replying to myself to thank all the Perl mongers who replied with help. Indeed, my problem is topological sort, as stated by Alex Vandiver and Gyepi SAM. I did not see that because the input format is different from that required by the Unix tsort program. A search for: tsort "perl power tools" found this http://search.cpan.org/src/CWEST/ppt-0.14/html/commands/tsort/index.html which leads directly to the perl code I used this to solve the problem. Note the strange name -- tcsort not tsort. (Perhaps in homage to Tom Christiansen, the prime mover of the very useful but moribund ppt project.) I earlier found tsort.exe (port to Windows) inside coreutils-5.3.0-bin.zip at http://sourceforge.net/project/showfiles.php?group_id=23617_id=1 42775 Unfortunately this tsort.exe depends on libintl3.dll which was not in the *.zip file and which I could not find anywhere. Aside: Does anyone know where I can get a libintl3.dll ? Both versions of tsort require the 2 values on each input row be separated by one space. Fortunately I was able to transform my data into this format. Major kudos to Ben Tilly! He wrote from scratch a perl program that solved the problem. (Since he put in the effort to write this I took some extra time to test it. It produced the same output as tsort, because the lists overlapped enough to overcome the fact that the output order is in general not deterministic.) I think the problem statement I gave was clear enough. Any cycle in the input is an error. The tsort program in perl simply reports "cycle detected" without any information as to which elements are on the cycle. My use was not related to alignment of DNA. It was part of a personal mashup to combine data about cars that I scraped from e.g. http://autos.yahoo.com/toyota_camry_se_v6-specs/?p=all The actual values in the list are strings such as these: Cylinders Horsepower @ RPM Fuel Economy Cty/Hwy As another aside, if people are interested I can send 77 lines of data for each of these 2008 model year cars: Camry, Accord, Infiniti_G35, Impreza, Altima, Audi_A4, Volvo_S40, Saab_9_3 I would not mind off-list opinions on any of these cars. In general I want a car with width <= 70.7 inches (the Accord at 71.7" is probably too wide to fit in my garage), and would like AWD. Thanks, Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tolkin, Steve Sent: Tuesday, January 29, 2008 12:12 PM To: Boston Perl Mongers Subject: [Boston.pm] merging lists that are ordered but not sorted I am looking for a perl program that will solve the following problem. Suppose I have 2 or more lists that are (conceptually) sublists of the same underlying list. I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. The reason that I have not just coded this up is that it seems it require an unbounded amount of look ahead. Also, when there are more than 2 lists, I think I need to read from all of them before making a decision about which element can be safely output. Thanks, Steve -- Steven Tolkin[EMAIL PROTECTED] 508-787-9006 Fidelity Investments 400 Puritan Way M3B Marlborough MA 01752 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tue, 29 Jan 2008 16:27:16 -0500 (EST) Chris Devers <[EMAIL PROTECTED]> wrote: CD> On Tue, 29 Jan 2008, Tolkin, Steve wrote: >> I want to reconstruct the underlying list. In other words the order of >> the elements agrees in all the lists, but there is no sort condition. ... CD> Out of curiosity, does it have to handle something like this? ... CD> That is, "outliers", I guess. ... CD> That is, "loops", I guess. CD> Seems like edge cases like that could make this non-determistic. I think the requirements are too vague, as your list of exceptions shows. Steve should either specify the narrow cases for which he cares (he didn't say the 123/1435 example he posted was the only necessary one) or should give the general behavior desired for merging of any list A1..An with another list B1..Bm It sounds like a fun problem, but as specified we're all only guessing what it is. Ted ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
http://search.cpan.org/~vparseval/List-MoreUtils-0.22/lib/List/MoreUtils.pm has primitives that look a little like what you're looking for, but quite exactly the same. Maybe you could adapt one of those utilities? On Jan 29, 2008 12:11 PM, Tolkin, Steve <[EMAIL PROTECTED]> wrote: > I am looking for a perl program that will solve the following problem. > Suppose I have 2 or more lists that are (conceptually) sublists of the > same underlying list. > I want to reconstruct the underlying list. In other words the order of > the elements agrees in all the lists, but there is no sort condition. > > Example: > List 1: dog, cat, mouse > List 2: dog, shark, mouse, elephant > > There are 2 possible outputs, and I do not care which one I get. > > The reason that I have not just coded this up is that it seems it > require an unbounded amount of look ahead. Also, when there are more > than 2 lists, I think I need to read from all of them before making a > decision about which element can be safely output. > > Thanks, > Steve > -- > Steven Tolkin[EMAIL PROTECTED] 508-787-9006 > Fidelity Investments 400 Puritan Way M3B Marlborough MA 01752 > There is nothing so practical as a good theory. Comments are by me, > not Fidelity Investments, its subsidiaries or affiliates. > > ___ > Boston-pm mailing list > Boston-pm@mail.pm.org > http://mail.pm.org/mailman/listinfo/boston-pm > -- jb. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tue, 29 Jan 2008, Tolkin, Steve wrote: > I want to reconstruct the underlying list. In other words the order of > the elements agrees in all the lists, but there is no sort condition. > > Example: > List 1: dog, cat, mouse > List 2: dog, shark, mouse, elephant > > There are 2 possible outputs, and I do not care which one I get. Out of curiosity, does it have to handle something like this? List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant List 3: apple, pear, orange That is, "outliers", I guess. Or this? List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant List 3: elephant, dog That is, "loops", I guess. Seems like edge cases like that could make this non-determistic. -- Chris Devers ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tue, Jan 29, 2008 at 12:11:56PM -0500, Tolkin, Steve wrote: > I am looking for a perl program that will solve the following problem. > Suppose I have 2 or more lists that are (conceptually) sublists of the > same underlying list. > I want to reconstruct the underlying list. In other words the order of > the elements agrees in all the lists, but there is no sort condition. > > Example: > List 1: dog, cat, mouse > List 2: dog, shark, mouse, elephant > > There are 2 possible outputs, and I do not care which one I get. Sounds like a problem begging for tsort. 'man tsort' or 'info coreutils tsort'. Should be on most unixoid systems. You'd need to pre-process your data into pairs of strings and feed that to tsort, which will produce one of the possible totally ordered outputs. $ tsort
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tuesday 29 January 2008 12:11, Tolkin, Steve wrote: > Suppose I have 2 or more lists that are (conceptually) sublists of the > same underlying list. > I want to reconstruct the underlying list. In other words the order of > the elements agrees in all the lists, but there is no sort condition. > > Example: > List 1: dog, cat, mouse > List 2: dog, shark, mouse, elephant > > There are 2 possible outputs, and I do not care which one I get. Sounds like you are trying to solve an alignment problem. It's a common thing to do in NLP (especially in speech recognition) and I think also when working with gene sequences. If you are interested in surveying what's available it might be worth to search for "align" or "alignment" on CPAN, and to look for BioPerl stuff. As an quick suggestion, I have used Algorithm::Diff in the past for similar problems. It uses the longest common subsequence algorithm, which provides a minimal difference solution to the alignment problem. The sdiff() function from that module should get you within a hair's breadth of the solution (you'd have only to extract the second or third element of each array ref that's returned). Another typical algorithm for sequence comparison is the minimum edit distance, for which there are a couple of modules on CPAN: Text::Levenshtein, and Text::WagnerFischer. But those modules only give you the distance, not the alignment... Bernardo ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Tue, 2008-01-29 at 12:04 -0800, Ben Tilly wrote: > [snip] > That is logically unavoidable. Yup, and in fact provably true. The problem is the same as finding a topological ordering among a directed acyclic graph. Wikipedia can inform you more on the topic ( http://en.wikipedia.org/wiki/Topological_sort ) but a better reference is the canonical "Introduction to Algorithms" by Cormen, Leiserson, Rivest, and Stein. - Alex -- Networking -- only one letter away from not working ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Jan 29, 2008 10:57 AM, David Golden <[EMAIL PROTECTED]> wrote: > On Jan 29, 2008 12:11 PM, Tolkin, Steve <[EMAIL PROTECTED]> wrote: > > I want to reconstruct the underlying list. In other words the order of > > the elements agrees in all the lists, but there is no sort condition. > > > > Example: > > List 1: dog, cat, mouse > > List 2: dog, shark, mouse, elephant > > > > There are 2 possible outputs, and I do not care which one I get. > > > > The reason that I have not just coded this up is that it seems it > > require an unbounded amount of look ahead. Also, when there are more > > than 2 lists, I think I need to read from all of them before making a > > decision about which element can be safely output. Yes, this is true. You need an unbounded look ahead. Because you might have 5 lists, and the last item of list 1 might be the first item of list 2, the last of list 2 might be the first of last 3, etc. Until you've read each list in full you can't know how to solve it. > What comes to mind is indexing all of the words on the maximum depth > they occur in any list. Then you output all the elements of max_depth > 1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3 > (mouse), max_depth 4 (elephant). List 1: dog, cat, mouse, rat, chimp List 2: chimp, gorilla, kangaroo The algorithm you described would output gorilla before rat. > You still have to read all the lists at least once, though. That is logically unavoidable. Here is some lightly tested code to do what is asked. It has, though, no error checking. sub merge_lists { my @list_infos = map { { list => $_, depth => {}, position => 0, done => 0, } } @_; # Create indexes. for my $list_info (@list_infos) { my $list = $list_info->{list}; my $depth = $list_info->{depth}; for my $i (0..$#$list) { $depth->{$list->[$i]} = $i; } } my @merged_list; FIND_ELEMENT: while (@list_infos) { LIST: for my $list_info (@list_infos) { my $element = $list_info->{list}->[ $list_info->{position} ]; # Is this element one that can go next? for my $other_list_info (@list_infos) { my $other_position = $other_list_info->{position}; my $other_depth = $other_list_info->{depth}->{$element}; if ($other_depth and $other_depth > $other_list_info->{position}) { # Oops, we'll have to take an element from another list. next LIST; } } # We've found our element, let's do bookkeeping push @merged_list, $element; for my $other_list_info (@list_infos) { my $other_list = $other_list_info->{list}; my $other_position = $other_list_info->{position}; if ( $element eq $other_list->[$other_position] ) { if ($other_position == $#$other_list) { $other_list_info->{done} = 1; } else { $other_list_info->{position}++; } } } # And cleanup. @list_infos = grep {not $_->{done}} @list_infos; next FIND_ELEMENT; } } return @merged_list; } And to run it just call: merge_list([qw(dog cat mouse)], [qw(dog shark mouse elephant)]; Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
On Jan 29, 2008 12:11 PM, Tolkin, Steve <[EMAIL PROTECTED]> wrote: > I want to reconstruct the underlying list. In other words the order of > the elements agrees in all the lists, but there is no sort condition. > > Example: > List 1: dog, cat, mouse > List 2: dog, shark, mouse, elephant > > There are 2 possible outputs, and I do not care which one I get. > > The reason that I have not just coded this up is that it seems it > require an unbounded amount of look ahead. Also, when there are more > than 2 lists, I think I need to read from all of them before making a > decision about which element can be safely output. What comes to mind is indexing all of the words on the maximum depth they occur in any list. Then you output all the elements of max_depth 1 (dog), all the elements of max_depth 2 (cat, shark), max_depth 3 (mouse), max_depth 4 (elephant). You still have to read all the lists at least once, though. David ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm