On Fri, 22 May 2009 13:36:11 +0000, Bodvar Bjorgvinsson <bodvar at gmail.com> wrote:
>Regarding the "endianess", I had a problem some 13 years ago with some >UNIX software that was supposed to work on Linux. It did not. I sent a >query to an Icelandic guy on the "Basic Linux Training" list I >subscribed to and he came up with a solution. Then he expained to me >that there was a difference between Linux an UNIX that one used big >endian and the other little endian in the same code of software. In current computer systems, there are two kinds of "endianess", called "LSB (Least Significant Byte) first" and "MSB (Most Significant Byte) first". For any given system, what determines this is not the operating system (Linux, Windows, etc.), it's the processor (CPU). All Intel CPUs are LSB first; others, like Sun SPARC and Motorola 68K, are MSB first. So Linux on a Sun SPARC would be MSB first, but on an Intel box it would be LSB first. Technically, the difference is indeed *byte* order, not *bit* order (which is constant). Suppose you have a hex number 0xABCD. The most significant byte is 0xAB; the least significant byte is 0xCD. Now imagine that you store this number in memory at address 0. ;-) You will get: Location SPARC Intel 00000000 0xAB 0xCD 00000001 0xCD 0xAB Well-designed programs where portability matters will work with *either* CPU. They do this by not caring what the storage order in memory is, and always accessing multibyte numbers through a set of functions that work regardless of byte order. For example, Mif2Go was originally developed on a Sun SPARC system, then ported to Windows very easily because it followed those design rules. There's actually a third flavor, but it was used only on the DEC PDP-11. Since the last of those is probably in the Smithsonian, you won't see it in current software. It is the same as Intel for two-byte numbers (shorts) but switches the byte pairs for 4-byte numbers (longs). So the number 0x12345678 is 0x34, 0x12, 0x78, 0x56. Endianness also affects Unicode, in the UTF-16 and UTF-32 encodings of it, but *not* in UTF-8. It is the reason for the UTF-16 BOM (Byte Order Mark), U+FEFF, In UTF-16 Big-endian (MSB first), the bytes are 0xFE 0xFF. In UTF-16 Little-endian (LSB first), they are 0xFF 0xFE. UTF-32 adds two zero bytes, before it for Big and after for Little. The Unicode BOM may also be used as an encoding signature, but I digress... ;-) Good thing it's Friday, eh? HTH! -- Jeremy H. Griffith, at Omni Systems Inc. <jeremy at omsys.com> http://www.omsys.com/