On Mar 31, 2010, at 8:36 PM, Barry Brevik wrote:
> I'm having a problem sorting data items that are alpha-numeric strings.
> I know how to do it if the string is all alpha or all numeric, but the
> combo eludes me.
>
> Take as example the following code. It is my desire that the machine
> names be processed in the order that they have been loaded into the
> hash. This is an example only- the machine names will not actually be
> loaded in order. Also, there will not always be "dashes" separating the
> alpha from the numbers:
>
> use strict;
> use warnings;
>
> my %mdata =
> (
> 'CALIBRATION1', 1,
> 'CALIBRATION02', 1,
> 'LABVIEW-1', 1,
> 'LABVIEW-2', 1,
> 'LABVIEW-4', 1,
> 'LABVIEW-11', 1,
> 'LABVIEW-12', 1,
> 'LABVIEW-114', 1,
> 'YESTECH-L3-RW1', 1,
> 'YESTECH-L03-RW2', 1,
> 'YESTECH-L4-RW125',1
> );
>
> foreach my $key (sort(keys(%mdata)))
> {
> print "$key\n";
> }
>
> The output of this code is as follows, and you can see that the sort
> order is not what I wanted:
>
> CALIBRATION02
> CALIBRATION1
> LABVIEW-1
> LABVIEW-11
> LABVIEW-114
> LABVIEW-12
> LABVIEW-2
> LABVIEW-4
> YESTECH-L03-RW2
> YESTECH-L3-RW1
> YESTECH-L4-RW125
>
> Any ideas on how to get this to come out in the "right" order?
It happens that I just had to solve that problem, but in C, not Perl. For what
it's worth, it took 59 lines of C code, and still has one technical bug that
I've decided to leave in because I can't think of an easy solution and it
should only turn up in pathological circumstances (specifically, an aisle in a
supermarket with an aisle number over 18 digits long).
You'll have to use the SUBNAME or BLOCK version of the sort function, and do
your own left-to-right scan, comparing one character at a time, except that
when you hit a digit on both sides, you scan to the end or the first non-digit
on each side, convert both sequences of digits to integers (being careful that
a leading 0 doesn't force you into octal) and then compare the two resulting
numbers. As soon as you hit an inequality, or one key runs out before the other
does, you can conclude that a>b or a<b. If you run out of both keys at once,
you can conclude that a==b. If you are working with UTF-8 or UTF-16 date, you
will also have to normalize the characters to UTF-32 first.
Alternatively, if your keys follow enough of a pattern, you can try creating a
normalized form of the data and then sorting that instead. It will probably be
faster that way, but you have to be certain that your normalized form and your
normalization routine will work for all inputs, whereas the above algorithm
will work for any arbitrary input, provided that there are no digit strings
longer than will fit in an integer.
--
John W Kennedy
"...if you had to fall in love with someone who was evil, I can see why it was
her."
-- "Alias"
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs