On Mar 31, 2010, at 8:36 PM, Barry Brevik wrote:

> I'm having a problem sorting data items that are alpha-numeric strings.
> I know how to do it if the string is all alpha or all numeric, but the
> combo eludes me.
> 
> Take as example the following code. It is my desire that the machine
> names be processed in the order that they have been loaded into the
> hash. This is an example only- the machine names will not actually be
> loaded in order. Also, there will not always be "dashes" separating the
> alpha from the numbers:
> 
>  use strict;
>  use warnings;
> 
>  my %mdata =
>  (
>    'CALIBRATION1',    1,
>    'CALIBRATION02',   1,
>    'LABVIEW-1',       1,
>    'LABVIEW-2',       1,
>    'LABVIEW-4',       1,
>    'LABVIEW-11',      1,
>    'LABVIEW-12',      1,
>    'LABVIEW-114',     1,
>    'YESTECH-L3-RW1',  1,
>    'YESTECH-L03-RW2', 1,
>    'YESTECH-L4-RW125',1
>  );
> 
>  foreach my $key (sort(keys(%mdata)))
>  {
>    print "$key\n";
>  }
> 
> The output of this code is as follows, and you can see that the sort
> order is not what I wanted:
> 
>  CALIBRATION02
>  CALIBRATION1
>  LABVIEW-1
>  LABVIEW-11
>  LABVIEW-114
>  LABVIEW-12
>  LABVIEW-2
>  LABVIEW-4
>  YESTECH-L03-RW2
>  YESTECH-L3-RW1
>  YESTECH-L4-RW125
> 
> Any ideas on how to get this to come out in the "right" order?

It happens that I just had to solve that problem, but in C, not Perl. For what 
it's worth, it took 59 lines of C code, and still has one technical bug that 
I've decided to leave in because I can't think of an easy solution and it 
should only turn up in pathological circumstances (specifically, an aisle in a 
supermarket with an aisle number over 18 digits long). 

You'll have to use the SUBNAME or BLOCK version of the sort function, and do 
your own left-to-right scan, comparing one character at a time, except that 
when you hit a digit on both sides, you scan to the end or the first non-digit 
on each side, convert both sequences of digits to integers (being careful that 
a leading 0 doesn't force you into octal) and then compare the two resulting 
numbers. As soon as you hit an inequality, or one key runs out before the other 
does, you can conclude that a>b or a<b. If you run out of both keys at once, 
you can conclude that a==b. If you are working with UTF-8 or UTF-16 date, you 
will also have to normalize the characters to UTF-32 first.

Alternatively, if your keys follow enough of a pattern, you can try creating a 
normalized form of the data and then sorting that instead. It will probably be 
faster that way, but you have to be certain that your normalized form and your 
normalization routine will work for all inputs, whereas the above algorithm 
will work for any arbitrary input, provided that there are no digit strings 
longer than will fit in an integer.

-- 
John W Kennedy
"...if you had to fall in love with someone who was evil, I can see why it was 
her."
  -- "Alias"



_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to