Russel Winder wrote:
<minor-rant>

On Thu, 2011-02-17 at 10:13 +0100, Don wrote:
[ . . . ]
Me too. A word is two bytes. Any other definition seems to be pretty useless.

Sounds like people have been living with 8- and 16-bit processors for
too long.

A word is the natural length of an integer item in the processor.  It is
necessarily machine specific.  cf. DEC-10 had 9-bit bytes and 36-bit
word, IBM 370 has an 8-bit byte and a 32-bit word, though addresses were
24-bit.  ix86 follows IBM 8-bit byte and 32-bit word.

Yes, I know. It's true but I think rather useless.
We need a name for an 8 bit quantity, and a 16 bit quantity, and higher powers of two. 'byte' is an established name for the first one, even though historically there were 9-bit bytes. IMHO 'word' wasn't such a bad name for the second one, even though its etomology comes from the machine word size of some specific early processors. But the equally arbitrary name 'short' has become widely accepted.

The really interesting question is whether on x86_64 the word is 32-bit
or 64-bit.

With the rising importance of the SIMD instruction set, you could even argue that it is 128 bits in many cases...


The whole concept of "machine word" seems very archaic and incorrect to me anyway. It assumes that the data registers and address registers are the same size, which is very often not true.

Machine words are far from archaic, even on the JVM, if you don't know
the length of the word on the machine you are executing on, how do you
know the set of values that can be represented?  In floating point
numbers, if you don't know the length of the word, how do you know the
accuracy of the computation?

Yes, but they're not necessarily the same number. There is a native size for every type of operation, but it's not universal across all operations.

I don't think there's a way you can define "machine word" in a way which is terribly useful. By the time you've got something unambiguous and well-defined, it doesn't have many interesting properties. It's valid in such limited cases that you'd be better off with a clearer name.

Clearly data registers and address registers can be different lengths,
it is not the job of a programming language that compiles to native code
to ignore this and attempt to homogenize things beyond what is
reasonable.

Agreed, and this is I think what makes the concept of "machine word" not very helpful.


If you are working in native code then word length is a crucial property
since it can change depending on which processor you compile for.

For example, on an 8-bit machine (eg, 6502 or Z80), the accumulator was only 8 bits, yet size_t was definitely 16 bits.

The 8051 was only surpassed a couple of years ago by ARMs as the most
numerous processor on the planet.  8-bit processors may only have had
8-bit ALUs -- leading to an hypothesis that the word was 8-bits -- but
the word length was effectively 16-bit due to the hardware support for
multi-byte integer operations.

The 6502 was restricted to 8 bits in almost every way. About half of the instructions that involved 16 bit quantities would wrap on page boundaries. jmp (0x7FF) would do an indirect jump, getting the low word from address 0x7FF and the high word from 0x700 !!


It's quite plausible that at some time in the future we'll get a machine with 128-bit registers and data bus, but retaining the 64 bit address bus. So we could get a size_t which is smaller than the machine word.

In summary: size_t is not the machine word.

Agreed !

As long as the address bus is less wide than an integer, there are no
apparent problems using integers as addresses.  The problem comes when
addresses are wider than integers.  A good statically-typed programming
language should manage this by having integers and addresses as distinct
sets.  C and C++ have led people astray.  There should be an appropriate
set of integer types and an appropriate set of address types and using
one from the other without active conversion is always going to lead to
problems.

Indeed.


Do not be afraid of the word.  Fear leads to anger.  Anger leads to
hate.  Hate leads to suffering. (*)

</minor-rant>

(*) With apologies to Master Yoda (**) for any misquote.

(**) Or more likely whoever his script writer was.

Reply via email to