On 10/30/09 Fri  Oct 30, 2009  12:37 PM, "Parag Kalra"
<paragka...@gmail.com> scribbled:

> Hello Folks,
> 
> This is my first post here.
> 
> I am trying to emulate Linux 'sort' command through Perl. I got following
> code through Internet to sort the text file:
> 
> # cat sort.pl
> my $column_number = 2; # Sorting by 3rd column since 0-origin based
> my $prev = "";
> for (
>   map { $_->[0] }
>   sort { $a->[1] cmp $b->[1] }
>   map { [$_, (split)[$column_number]] }
>   <>
> ) {
>   print unless $_ eq $prev;
>   $prev = $_;
> }
> 
> Suppose I want to sort the data of text file having following rows &
> columns:
> 
> # cat test.out
> jhvXgF    U13GWt    3OvMCf    VMkAWj
> 4ewejk    pFnjd4    ie0hZF    pPipQJ
> 4ewejk    4sqprx    ie0hZF    cqtexi
> FT9mWp    d4fgMB    gvZRJU    XRRu0N
> hnzI2c    GXAXWF    6xKH7A    3dLh18
> 
> When I sort it using the 'sort' command by 3rd column I get following
> output:
> 
> # sort -u -k 3 test.out
> jhvXgF    U13GWt    3OvMCf    VMkAWj
> hnzI2c    GXAXWF    6xKH7A    3dLh18
> FT9mWp    d4fgMB    gvZRJU    XRRu0N
> 4ewejk    4sqprx    ie0hZF    cqtexi
> 4ewejk    pFnjd4    ie0hZF    pPipQJ
> 
> However when I sort the same text file by 3rd column using the piece of
> code, I get following:
> jhvXgF    U13GWt    3OvMCf    VMkAWj
> hnzI2c    GXAXWF    6xKH7A    3dLh18
> FT9mWp    d4fgMB    gvZRJU    XRRu0N
> 4ewejk    pFnjd4    ie0hZF    pPipQJ
> 4ewejk    4sqprx    ie0hZF    cqtexi
> 
> Difference can be seen the last 2 row values of 2nd column.
> 
>  The reason being 'ie0hZF' is repeated twice in 3rd column and also
> corresponding values in 1st column are same - '4ewejk' so discrepancy has
> occured in 2nd column.
> Can anybody help me fix the bug in the above code.

It's not a bug, it's a "feature"!

For the two lines in question, the values in the selected "key" columns are
equal. Therefore, if all you require is that the lines appear in order by
that key column, the order of these two lines does not matter.

If you want lines with equal keys to be sorted into some specific order,
then you need to specify what that order might be. There are two obvious
possibilities:

1. Lines with equal keys appear in the order in which they were present in
the input.

2. Lines with equal keys appear in order according to some other column.

Approach 1. is called a "stable sort". As you can see, Perl's sort gives you
a stable sort, as the last two lines are in the order they appear in the
original file. The Unix sort is not stable, so the lines with equal keys get
reversed in the output. Whether or not you consider this a "bug" depends
upon your application. You can generate a stable sort with the -s option to
Unix sort (may depend upon your version).

Approach 2. is called a "compound sort" (or something like that). You can
add additional sort criteria to the sort subroutine by using the or (||)
operator. One column is the primary sort key; another is the secondary sort
key, etc. This example will sort by column 3, then, if the entries in column
3 are equal, column 2 (untested):

 for (
   map { $_->[0] }
   sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] }
   map { [$_, (split)[2,1] ] }
   <>
 ) {
   print;
 }



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to