A couple of takeaways from all of this, at least for me:

1. Rather surprisingly, strcmp() and strcasecmp() do not return the same
results *for strings that are not mixed case*! For example, for ("ABC",
"123") strcmp() would return < but strcasecmp() would return > (untested).
Consider the following bit of program logic, which would fail: Build a table
of userid's from some system source. Because they come from the system they
are all uppercase, so they could be sorted using strcmp(). When a user
enters a userid, look it up in the table using a binary search. Since the
user might enter the id in mixed case, search using strcasecmp(). The binary
search would fail. (Yes, you could first uppercase the user input and use
strcmp(). This is an illustrative example, not a "problem.")

2. One has to be very careful mixing strcasecmp() with roll-your-own
compares such as if ( tolower(left[i]) != tolower(right[i]) ) ... I am
checking my code for this issue.

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Charles Mills
Sent: Thursday, June 1, 2017 4:45 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: strcasecmp() comparing punctuation in ASCII?

> strcasecmp() is obliged to convert all upper-case letters into 
> lower-case
for the comparison

I don't think it is as simple-minded as that (no offense -- you're not
simple-minded either <g>).

I think @John McKown pretty much nailed it. It's an "abstract" compare that
just happens (well, a little more than just happens) to largely conform to
ASCII.

I think the -37 and 122 confirms that it is not a simple "subtract one ASCII
value from another." And yes, I picked 'Z' and '0' specifically because they
order differently ASCII versus EBCDIC. I've deleted my test code now but an
interesting case would be 'A' and 'b', which differ in order between EBCDIC
and ASCII also. A guess would be that the result would be +1, because they
should be one entry apart in the abstract table, unlike their code points,
which are x'20' or x'40' apart. ('A' and 'a' of course compare equal --
that's the whole point, isn't it?)

Yes, the results were astonishing. I assumed a bug in my code.

My code is not only working correctly, it's superfluous! I do initial
development and alpha test on Windows. I have some
hard-coded-in-collating-order tables that I binary search. Most of them use
all-alpha keys and so that order does not matter ASCII (Windows) versus
EBCDIC (MVS). A few tables have mixed alpha, numerics and/or punctuation and
so those I sort into collating sequence on start-up. Well, I needn't have
bothered! As you indicate, lower_bound results are consistent between
Windows and MVS.
strcasecmp() is obliged to convert all upper-case letters into lower-case
for the comparison.

Lower-case 'z' in EBCDIC is 0xa9, '0' in EBCDIC is 0xf0.  0xA9 is less than
0xF0; so I would expect strcasecmp() to return a value less than zero.

But - as you point out - in ASCII, 'z' is 0x7a, and '0' is 0x30.  0x7a is
greater than 0x30, so on an ASCII platform I would expect this to return a
positive value (note that 0x7a - 0x30 is 74 - not the 122 that IBM returned,
although both are positive.)

IBM must be mapping the values to some abstract code-points and then
subtracting those...

It _would_ mean that the strcasecmp() results are consistent between ASCII
and EBCDIC environments... which might sometimes be a desirable
characteristic - and other times not...
For IBM-MAIN subscribe / signoff / archive access instructions, send email
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to