(Will whomever prepending this "Re: *** GMX Spamverdacht ***" header
please STOP.)
On Monday, August 18, 2003, at 09:01 AM, Dave Howe wrote:
randomness is a funny thing. a truely random file can be *anything*
that
is the right length - including a excerpt from william shakespere
(provided you have enough monkeys)
it is most likely to be random garbage, for a large sample - but for a
very small sample the odds of it being an ordered sample are
surprisingly
good.
Quibble: only surprising if one misunderstands probability. (Not saying
you do, just quibbling with any claim that readily calculated
probabilities can be "surprising.")
the obvious example here is a double coin test - two bits. an ordered
sample would be both heads or both tails. a disordered sample would be
one
head and one tail. in practice, you would expect to see half the
results
from a larger trial (say 32 double-throws) with a ordered sample, and
half
disordered.
as you reach three coins, the odds of a completely ordered result
decrease
(from 2 in 2^2, to 2 in 2^3). for four coins, you still have the same
two
compressable results. consider:
HHHTHHTHHHTT
HTHHHTHTHTTHHTTT
THHHTHHTTHTHTHTT
TTHHTTHTTTTH
in a large trial, you would expect to see each of these once every 2^4
tests, on the average. obviously and are very compressable.
assuming a runlength encoding, I don't think any of the others are
compressable at all"We should not march into Baghdad. To occupy
Iraq would
instantly shatter our coalition, turning the whole Arab
world against us and make a broken tyrant into a latter-
day Arab hero. Assigning young soldiers to a fruitless
hunt for a securely entrenched dictator and condemning
them to fight in what would be an unwinable urban guerilla
war, it could only plunge that part of the world into ever
greater instability."
--George H. W. Bush, "A World Transformed", 1998
For any sequence of n fair tosses, the "length after compression" of
the outcome is roughly, modulo some constants and factor, (1 - 2^(-n)),
where "1" is the uncompressed length. In other words, as n gets large
nearly all strings have a "length after compression" that is close to
the original length, i.e., little compression. As n gets arbitrarily
large, an arbitrarily small fraction of strings have a short, concise
description (are "compressed").
--Tim May