On 10/30/09 Fri Oct 30, 2009 12:37 PM, "Parag Kalra" <paragka...@gmail.com> scribbled:
> Hello Folks, > > This is my first post here. > > I am trying to emulate Linux 'sort' command through Perl. I got following > code through Internet to sort the text file: > > # cat sort.pl > my $column_number = 2; # Sorting by 3rd column since 0-origin based > my $prev = ""; > for ( > map { $_->[0] } > sort { $a->[1] cmp $b->[1] } > map { [$_, (split)[$column_number]] } > <> > ) { > print unless $_ eq $prev; > $prev = $_; > } > > Suppose I want to sort the data of text file having following rows & > columns: > > # cat test.out > jhvXgF U13GWt 3OvMCf VMkAWj > 4ewejk pFnjd4 ie0hZF pPipQJ > 4ewejk 4sqprx ie0hZF cqtexi > FT9mWp d4fgMB gvZRJU XRRu0N > hnzI2c GXAXWF 6xKH7A 3dLh18 > > When I sort it using the 'sort' command by 3rd column I get following > output: > > # sort -u -k 3 test.out > jhvXgF U13GWt 3OvMCf VMkAWj > hnzI2c GXAXWF 6xKH7A 3dLh18 > FT9mWp d4fgMB gvZRJU XRRu0N > 4ewejk 4sqprx ie0hZF cqtexi > 4ewejk pFnjd4 ie0hZF pPipQJ > > However when I sort the same text file by 3rd column using the piece of > code, I get following: > jhvXgF U13GWt 3OvMCf VMkAWj > hnzI2c GXAXWF 6xKH7A 3dLh18 > FT9mWp d4fgMB gvZRJU XRRu0N > 4ewejk pFnjd4 ie0hZF pPipQJ > 4ewejk 4sqprx ie0hZF cqtexi > > Difference can be seen the last 2 row values of 2nd column. > > The reason being 'ie0hZF' is repeated twice in 3rd column and also > corresponding values in 1st column are same - '4ewejk' so discrepancy has > occured in 2nd column. > Can anybody help me fix the bug in the above code. It's not a bug, it's a "feature"! For the two lines in question, the values in the selected "key" columns are equal. Therefore, if all you require is that the lines appear in order by that key column, the order of these two lines does not matter. If you want lines with equal keys to be sorted into some specific order, then you need to specify what that order might be. There are two obvious possibilities: 1. Lines with equal keys appear in the order in which they were present in the input. 2. Lines with equal keys appear in order according to some other column. Approach 1. is called a "stable sort". As you can see, Perl's sort gives you a stable sort, as the last two lines are in the order they appear in the original file. The Unix sort is not stable, so the lines with equal keys get reversed in the output. Whether or not you consider this a "bug" depends upon your application. You can generate a stable sort with the -s option to Unix sort (may depend upon your version). Approach 2. is called a "compound sort" (or something like that). You can add additional sort criteria to the sort subroutine by using the or (||) operator. One column is the primary sort key; another is the secondary sort key, etc. This example will sort by column 3, then, if the entries in column 3 are equal, column 2 (untested): for ( map { $_->[0] } sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] } map { [$_, (split)[2,1] ] } <> ) { print; } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/