[Warning: rather long. But useful, I hope.]
Adriano wrote:
> Can someone look at StringToGuid/GuidToString and confirm that they are
> broken for big-endian (the code branch when legacy == false)?
Unfortunately, I don't have access to a big-endian machine. But the code of
those functions is flawed for both big- and little-endian machines.
GuidToString uses this format string:
{%02hx%02hx%02hx%02hx-%02hx%02hx-%02hx%02hx-%02hx%02hx-%02hx%02hx%02hx%02hx%02hx%02hx}
and creates the ASCII representation thusly:
sprintf(buffer, GUID_NEW_FORMAT_UPPER,
USHORT(guid->data[0] & 0xFF), USHORT(guid->data[0] >> 8),
USHORT(guid->data[1] & 0xFF), USHORT(guid->data[1] >> 8),
USHORT(guid->data[2] & 0xFF), USHORT(guid->data[2] >> 8),
USHORT(guid->data[3] & 0xFF), USHORT(guid->data[3] >> 8),
USHORT(guid->data[4] & 0xFF), USHORT(guid->data[4] >> 8),
USHORT(guid->data[5] & 0xFF), USHORT(guid->data[5] >> 8),
USHORT(guid->data[6] & 0xFF), USHORT(guid->data[6] >> 8),
USHORT(guid->data[7] & 0xFF), USHORT(guid->data[7] >> 8));
It reads the guid structure as a sequence of 8 words (which it isn't), and then
writes out each 'word' LSB first. This is wrong: network order is MSB first.
On big-endian machines, this means that every 2 adjacent bytes are swapped.
On little-endian machines, funny enough, the second half of the string turns
out right. It consists of 1-byte fields. They are read as two-byte words, but
because little-endian words are LSB-first, the sprintf puts them back in the
right order.
2-byte words and 4-byte dwords are printed with their bytes reversed on
little-endian machines.
(You may notice that the above claim is not consistent with the observed
behaviour of UUID_TO_CHAR. I'll get to that a little later.)
To get GuidToString right, the format string should be:
{%08lX-%04hX-%04hX-%02hX%02hX-%02hX%02hX%02hX%02hX%02hX%02hX}
and the command:
sprintf(buffer, GUID_RIGHT_FORMAT_UPPER,
guid->data1, guid->data2, guid->data3,
guid->data4[0], guid->data4[1], guid->data4[2], guid->data4[3],
guid->data4[4], guid->data4[5], guid->data4[6], guid->data4[7]);
This works on both big- and little-endian machines. No need to do any
byte-position juggling ourselves: the compiler knows the byte order.
What goes for GuidToString also goes for StringToGuid. They are each other's
complement (or rather: inverse function).
Now to evlUuidToChar. This function reads the 16-char OCTETS string and
produces the 36-char ASCII string.
At a certain point, it creates a GUID record like this:
const FB_GUID* guid = reinterpret_cast<const FB_GUID*>(data);
That is not the right way, because the data are in network order (at least they
should be - this is the 16-char string).
So guid will be wrong on little-endian machines and right on big-endian
machines.
More precisely, on little-endian machines:
- data1, data2 and data3 all have their bytes reversed (i.e. not in
little-endian host order);
- data4 is OK, because this is an array of *bytes*.
Then, for UUID_TO_CHAR:
case funUuidBroken:
GuidToString(buffer, guid, false);
break;
So here, the network-order byte string is fed to GuidToString, which reads it
as series of host-order words and then swaps each word's bytes before
outputting them to the 36-char ASCII string.
The effect on little-endian machines is that all the multi-byte (d)words, which
are reversed in guid, are put right again by the flawed GuidToString. And the
bytes of the array, which are already in the right place, are also output
correctly by GuidToString (as shown earlier).
On my little-endian machine:
select UUID_TO_CHAR(x'11223344556677889900AABBCCDDEEFF') from rdb$database
-> 11223344-5566-7788-9900-AABBCCDDEEFF
However, on big-endian machines, GuidToString swaps every pair of bytes in the
guid struct (which already has the correct order for those machines), so the
output will be wrong there.
For UUID_TO_CHAR2:
case funUuidRfc:
sprintf(buffer, GUID_NEW_FORMAT_LOWER,
USHORT((guid->data1 >> 24) & 0xFF), USHORT((guid->data1 >> 16) & 0xFF),
USHORT((guid->data1 >> 8) & 0xFF), USHORT(guid->data1 & 0xFF),
USHORT((guid->data2 >> 8) & 0xFF), USHORT(guid->data2 & 0xFF),
USHORT((guid->data3 >> 8) & 0xFF), USHORT(guid->data3 & 0xFF),
USHORT(guid->data4[0]), USHORT(guid->data4[1]),
USHORT(guid->data4[2]), USHORT(guid->data4[3]),
USHORT(guid->data4[4]), USHORT(guid->data4[5]),
USHORT(guid->data4[6]), USHORT(guid->data4[7]));
break;
Because data1, data2 and data3 are in network order but the code expects them
in host order, on little-endian machines the bytes in those fields will be
reversed in the output string. The bytes in the array are fine.
Indeed:
select UUID_TO_CHAR2(x'11223344556677889900AABBCCDDEEFF') from rdb$database
-> 44332211-6655-8877-9900-aabbccddeeff
Mind you: because of the swap in data3, combined with the flaw in GEN_UUID, the
output of UUID_TO_CHAR2 *looks* fine on little-endian machines, because the 4
(version number) appears in the right position.
On big-endian machines, UUID_TO_CHAR2 should work fine, because big-endian host
order is the same as network order (and also "natural" order, the way we write
our binary, octal, decimal and hexadecimal numbers).
BTW, I didn't look at evlCharToUuid, but I guess similar things are happening
there, because on little-endians CHAR_TO_UUID works fine despite the flaws in
StringToGuid, and CHAR_TO_UUID2 doesn't.
So, all in all, our five UUID functions are all flawed on at least one type of
platform. But it's not hard to get them right, with code that is even simpler
than what we have now (and without the need for the '2' functions).
Cheers,
Paul Vinkenoog
(why do I always do these things at night?)
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel