Re: *** GMX Spamverdacht *** Re: *** GMX Spamverdacht *** Re: paradoxes of randomness

2003-08-18 Thread Dave Howe
Sarad AV wrote:
 Will the information obtained from the 2^32 tests have
 a zero compression rate?
 If one of the occurance should yield all heads and one
 occurance yields all tails-there appears to be scope
 for compression.
In that one case, yes.
The compression will vary based on the deviation from an ideally
compressable sample - for *any* bit pattern 0x to 0x you would
expect to see it once in 2^32 trials (by definition) therefore you will
get a mix of compressable and uncompressable patterns, with uncompressable
patterns being more likely (simply because there are more of them in the
full range of available patterns 0x to 0x

 If the output is random,then it will have no
 mathametical structure,so I shouldn't be able to
 compress it at all.
randomness is a funny thing. a truely random file can be *anything* that
is the right length - including a excerpt from william shakespere
(provided you have enough monkeys)
it is most likely to be random garbage, for a large sample - but for a
very small sample the odds of it being an ordered sample are surprisingly
good.
the obvious example here is a double coin test - two bits. an ordered
sample would be both heads or both tails. a disordered sample would be one
head and one tail. in practice, you would expect to see half the results
from a larger trial (say 32 double-throws) with a ordered sample, and half
disordered.
as you reach three coins, the odds of a completely ordered result decrease
(from 2 in 2^2, to 2 in 2^3). for four coins, you still have the same two
compressable results. consider:
HHHTHHTHHHTT
HTHHHTHTHTTHHTTT
THHHTHHTTHTHTHTT
TTHHTTHTTTTH

in a large trial, you would expect to see each of these once every 2^4
tests, on the average. obviously  and  are very compressable.
assuming a runlength encoding, I don't think any of the others are
compressable at all.



Re: *** GMX Spamverdacht *** Re: *** GMX Spamverdacht *** Re: *** GMX Spamverdacht *** Re: paradoxes of randomness

2003-08-18 Thread Tim May
(Will whomever prepending this Re: *** GMX Spamverdacht *** header 
please STOP.)

On Monday, August 18, 2003, at 09:01  AM, Dave Howe wrote:
randomness is a funny thing. a truely random file can be *anything* 
that
is the right length - including a excerpt from william shakespere
(provided you have enough monkeys)
it is most likely to be random garbage, for a large sample - but for a
very small sample the odds of it being an ordered sample are 
surprisingly
good.
Quibble: only surprising if one misunderstands probability. (Not saying 
you do, just quibbling with any claim that readily calculated 
probabilities can be surprising.)



the obvious example here is a double coin test - two bits. an ordered
sample would be both heads or both tails. a disordered sample would be 
one
head and one tail. in practice, you would expect to see half the 
results
from a larger trial (say 32 double-throws) with a ordered sample, and 
half
disordered.
as you reach three coins, the odds of a completely ordered result 
decrease
(from 2 in 2^2, to 2 in 2^3). for four coins, you still have the same 
two
compressable results. consider:
HHHTHHTHHHTT
HTHHHTHTHTTHHTTT
THHHTHHTTHTHTHTT
TTHHTTHTTTTH

in a large trial, you would expect to see each of these once every 2^4
tests, on the average. obviously  and  are very compressable.
assuming a runlength encoding, I don't think any of the others are
compressable at allWe should not march into Baghdad. To occupy 
Iraq would
instantly shatter our coalition, turning the whole Arab
world against us and make a broken tyrant into a latter-
day Arab hero. Assigning young soldiers to a fruitless
hunt for a securely entrenched dictator and condemning
them to fight in what would be an unwinable urban guerilla
war, it could only plunge that part of the world into ever
greater instability.
--George H. W. Bush, A World Transformed, 1998
For any sequence of n fair tosses, the length after compression of 
the outcome is roughly, modulo some constants and factor, (1 - 2^(-n)), 
where 1 is the uncompressed length. In other words, as n gets large 
nearly all strings have a length after compression that is close to 
the original length, i.e., little compression. As n gets arbitrarily 
large, an arbitrarily small fraction of strings have a short, concise 
description (are compressed).

--Tim May