Re: Random Data Compressed 100:1 (Guffaw)

2002-01-16 Thread Eric Cordian

Seems our intrepid reporter has given Mr. Saint George, the ZeoSync CEO,
his own writeup in Wired News.

http://www.wired.com/news/technology/0,1282,49599,00.html

Particularly interesting is the following claim...

  What hasn't been previously proven, we're proving. We can compress
   every single permutation of an N-member set. These are going to be the
   details that we're going to be announcing in a few days.

Last I counted, there were N! permutations of an N-member set, and N^N
sequences of N elements taken from an N-member set.  

For big N, and even moderately sized N, N!  N^N.

I hope Mr. Saint George hasn't spent the last 12 years of his life
discovering that permutations are compressable. 

-- 
Eric Michael Cordian 0+
O:.T:.O:. Mathematical Munitions Division
Do What Thou Wilt Shall Be The Whole Of The Law




Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread Eugene Leitl

On Tue, 8 Jan 2002, Steve Schear wrote:

 combinations/permutations and auto correlations to code for the runs.  I
 say attempted, because I was never able to find acceptable algorithms to
 satisfy my requirement.  I still believe these algorithms exist, it was
 just my limitations in identifying the underlying math needed.

http://www.google.com/search?q=IFS+image+compressionsourceid=operanum=100ie=utf-8oe=utf-8




Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread Ken Brown

Michael Motyka wrote:


 Here we go :
 
 As Eric correctly points out, true random input simply cannot be
 compressed, it doesn't matter how clever you are or how much time and
 computing power you have access to.
 
 This is a statement of belief isn't it? Odd.

No, it's a logical consequence of his definition of random  no more a
statement of belief than 1+1=2 is. If you use the word to mean
something else then it might or might not be true in your universe.

Ken




Re: CDR: Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread F. Marc de Piolenc

I think they may be referring to a random string of _ASCII characters_.
That would be subject to compression because it is not random at the bit
level. 

But 100:1? I have no idea how to achieve that.

Marc de Piolenc

Declan McCullagh wrote:

  What exactly is random data? Does it have to appear to be random? Does
  it have to pass some set of statistical tests to be random? If a string
 
 I'm naturally skeptical of this claim (until I can verify it for
 myself), but I do not believe the claim is we can encode random data
 at 100:1.  They seem to be talking about 100:1 lossless compression
 of real-world data, which is generally not random.
 
 -DEclan

-- 
Remember September 11, 2001 but don't forget July 4, 1776

Rather than make war on the American people and their
liberties, ...Congress should be looking for ways
to empower them to protect themselves when
warranted.

They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety. - Benjamin Franklin





Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread Ken Brown

Eric Cordian wrote:
 
 Declan opines:
 
  I'm naturally skeptical of this claim (until I can verify it for
  myself), but I do not believe the claim is we can encode random data
  at 100:1.
 
 From the article:
 
 ZeoSync said its scientific team had succeeded on a small scale in
  compressing random information sequences in such a way as to allow the
  same data to be compressed more than 100 times over -- with no data
  loss.
 
 Now of course it's possible they were horribly misquoted.  Still, it is
 worrisome that so many people quoted in the article think such algorithmic
 gymnastics are mathematically possible.

The overpowering stench of snake oil pervades the ether

I particularly liked:

The techniques described by ZeoSync would mark a break with the dozens
of existing compression technologies, including MPEG for video and music
and JPEG for pictures and graphics are able to compact data at
compression rates up to 10 times the original. These algorithms
typically work by eliminating long strings of identifiable bits of data
such as blue sky, green grass or the white background of a snow-covered
landscape.

Which sounds like someone who doesn't know what they are talking about
being misreported by someone who doesn't understand (*)

ZeoSync said its scientific team had succeeded on a small scale


The company's claims, which are yet to be demonstrated in any public
forum

ZeoSync, whose Web site can be located at http://www.zeosync.com/; 


Among the scientific team working with ZeoSync is Steve Smale, one of
America's most renowned mathematicians. Smale is an emeritus professor
at the University of California at Berkeley and the 1966 winner of the
Fields Prize, the Nobel Prize for researchers in this field. He could
not be reached for comment on his role in the project.

I bet. 



Ken Brown 

(*) clue for those who have too much of a life to bother with things
like file structures and encoding (though why would they be reading
cypherpunks?)  - JPEG is a lossy compression method that tries to recode
real world pictures  in a way that *looks* almost as good as the real
thing but takes up less space, by smoothing out gradual changes of
colour (and other stuff as well). It doesn't typically work by
eliminating long strings of identifiable bits of data. And it doesn't
compress up to 10 times, it compresses as much as you like, with a
trade-off between file size and image quality. MPEG, AFAIK, is similar.




Re: CDR: Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread measl


On Wed, 9 Jan 2002, Ken Brown wrote:

 Michael Motyka wrote:
 
 
  Here we go :
  
  As Eric correctly points out, true random input simply cannot be
  compressed, it doesn't matter how clever you are or how much time and
  computing power you have access to.
  
  This is a statement of belief isn't it? Odd.
 
 No, it's a logical consequence of his definition of random  no more a
 statement of belief than 1+1=2 is. If you use the word to mean
 something else then it might or might not be true in your universe.

So, it depends on what the definition of is is?  ;-)

-- 
Yours, 
J.A. Terranson
[EMAIL PROTECTED]

If Governments really want us to behave like civilized human beings, they
should give serious consideration towards setting a better example:
Ruling by force, rather than consensus; the unrestrained application of
unjust laws (which the victim-populations were never allowed input on in
the first place); the State policy of justice only for the rich and 
elected; the intentional abuse and occassionally destruction of entire
populations merely to distract an already apathetic and numb electorate...
This type of demogoguery must surely wipe out the fascist United States
as surely as it wiped out the fascist Union of Soviet Socialist Republics.

The views expressed here are mine, and NOT those of my employers,
associates, or others.  Besides, if it *were* the opinion of all of
those people, I doubt there would be a problem to bitch about in the
first place...






Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread Steve Schear

At 12:06 PM 1/9/2002 +0100, Eugene Leitl wrote:
On Tue, 8 Jan 2002, Steve Schear wrote:

  combinations/permutations and auto correlations to code for the runs.  I
  say attempted, because I was never able to find acceptable algorithms to
  satisfy my requirement.  I still believe these algorithms exist, it was
  just my limitations in identifying the underlying math needed.

http://www.google.com/search?q=IFS+image+compressionsourceid=operanum=100ie=utf-8oe=utf-8

I knew the guys behind Fractal compression, but that's not what I was 
trying to do.  They looked for a higher-level order I was just looking to 
take advantage of local correlations and the ability to identify identical 
runs in the output of deterministic generators who's references could be 
expressed very efficiently (e.g., correlation coefficient/seed value, 
displacement) compared with the data being compressed.  Its a highly 
asymmetric approach so real-time applications were not considered.

IFS eventually became practical but they came from a time when proprietary 
solutions were king and missed the open standards revolution which powered 
JPG (which while generally as good at compression was open to improvement, 
well studied and not chained by patents).

steve




Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread Michael Motyka

Ken Brown [EMAIL PROTECTED] wrote :

Michael Motyka wrote:
 Here we go :
 
 As Eric correctly points out, true random input simply cannot be
 compressed, it doesn't matter how clever you are or how much time and
 computing power you have access to.
 
 This is a statement of belief isn't it? Odd.

No, it's a logical consequence of his definition of random  no more a
statement of belief than 1+1=2 is. If you use the word to mean
something else then it might or might not be true in your universe.

Ken

Perhaps my universe is off in all six degrees but answer this - which is
it : 

random data cannot be compressed 

or

random data may include recognizable and even repeating patterns 

A string of random data may include anything. Compressibility will vary
with the particular data and the algorithm used. Lucky with one input
set, unlucky with another. I wouldn't try to base a business on getting
lucky with the customer's input data but the absolute statement of
incompressibility is wrong unless you preselect your data sets. Then
they're no longer random and the definition of random becomes circular.

Mike




Re: Random Data Compressed 100:1 (Guffaw)

2002-01-09 Thread Tim May

On Tuesday, January 8, 2002, at 07:12 PM, F. Marc de Piolenc wrote:

 I think they may be referring to a random string of _ASCII characters_.
 That would be subject to compression because it is not random at the bit
 level.

 But 100:1? I have no idea how to achieve that.

The entropy of ordinary written English is about 2.2 bits per character. 
(Meaning, of course, that each new character contributes not much 
surprise, i.e., is somewhat predictable. Entropies of various sources  
can be experimentally measured.)

Others, particularly Eric Cordian, have already written at length on 
this claim of 100:1 compression of any file.

Steve Smale at Berkeley is a famous and respected mathematician. He may 
be unaware of these claims, he may be senile, he may have worked on some 
small (no pun intended) aspect as a paid consultant, or the actual 
compression may be lossy and oriented toward replacing JPEG with a 
better lossy algorith (as Iterated Systems, IIRC, has been doing with 
fractal compression).

The pigeonhole principle is not just a theory, it's the _law_. You 
can't stuff distinct 64 things into 32 boxes without a lot of them being 
put into the same box. Which means reversing (decompressing) to the 
original won't work.

The Chaitin-Kolmogorov view of algorithmic information theory is the 
best model we have.

By the way, for all of you who struggled to understand Godel's Theorem 
expressed in the logic formulation that Godel first used--and that was 
the basis of Newman's fairly popular exposition--there are very 
understandable formulations of Godel's Theorem in terms of information 
theory and compressibility. Chaitin has a very clean one (no Kleene pun 
intended). Rudy Rucker does a good job of explaining it in Mind Tools. 
Chaitin also has some good books and articles, though some of them jump 
too quickly into stuff like computing Omega...his Scientific 
American-level articles on the foundations of mathematics are better 
places to start...included in some of his books.


--Tim May
A democracy cannot exist as a permanent form of government. It can only 
exist until the voters discover that they can vote themselves money from 
the Public Treasury. From that moment on, the majority always votes for 
the candidate promising the most benefits from the Public Treasury with 
the result that a democracy always collapses over loose fiscal policy 
always followed by dictatorship. --Alexander Fraser Tyler




Random Data Compressed 100:1 (Guffaw)

2002-01-08 Thread Eric Cordian

Someone else needs to read the comp.compression FAQ.

http://www.reuters.com/news_article.jhtml?type=technologynewsStoryID=498720

-

NEW YORK (Reuters) - A Florida research start-up working with a team of
renowned mathematicians said on Monday it had achieved a breakthrough that
overcomes the previously known limits of compression used to store and
transmit data.
   
If proven and successfully commercialized, the discovery asserted by
ZeoSync Corp of West Palm Beach, Florida could overturn half a century of
thinking in the field of lossless data compression and undermine business
assumptions on which the telecommunications and other digital industries
are based.
   
Lossless compression is the process by which computer data can be
compacted, stored and then restored with complete fidelity, or zero loss
of data. Existing compression techniques typically shed redundant data in
order to conserve space.
   
ZeoSync said its scientific team had succeeded on a small scale in
compressing random information sequences in such a way as to allow the
same data to be compressed more than 100 times over -- with no data loss.
That would be at least an order of magnitude beyond current known
algorithms for compacting data.

[Someone needs to learn the pigeonhole principle.  You can not give
 each member of a set of 2^(N*100) things its own unique identifier
 taken from a set of 2^N things.]
   
The company's claims, which are yet to be demonstrated in any public
forum, could vastly boost the ability of computer disks to store, text,
music and video -- if ZeoSync's formulae succeed in scaling up to handle
massive amounts of data.
   
The same compression technique might one day make make high-speed Internet
access cheaper and widely available across the globe, posing a threat to
huge investments in telecommunications network capacity, an industry
analyst said.
   
Either this research is the next 'Cold Fusion' scam that dies away or
it's the foundation for a Nobel Prize. I don't have an answer to which one
it is yet, said David Hill, a data storage analyst with Boston-based
Aberdeen Group.

[David needs to read the comp.compression FAQ too.]
   
In 1990, a group of Utah researchers scandalized the scientific world with
claims -- quickly found to be unsupported -- that the long-sought answer
to the problem of Cold Fusion had been discovered.
   
Hill, the only independent analyst briefed ahead of the announcement, said
ZeoSync's claims were theoretically feasible, but years away from
definitive proof. He will require more information before evaluating
ZeoSync's claims, he said.

[You know, if you run the random data through the algorithm a second
 time, I bet you would get 10,000:1 compression.]
   
ZeoSync, whose Web site can be located at http://www.zeosync.com/, was
founded by Peter St. George, an Internet and financial services
entrepreneur, who has a background in telecommunications research.
   
ZeoSync, with 30 full-time employees, said it had collaborated with
experts from Harvard University, Massachusetts Institute of Technology,
Stanford University and researchers in Warsaw, Moscow, and Beijing.
   
BREAK WITH CURRENT TECHNOLOGIES
   
Among the scientific team working with ZeoSync is Steve Smale, one of
America's most renowned mathematicians. Smale is an emeritus professor at
the University of California at Berkeley and the 1966 winner of the Fields
Prize, the Nobel Prize for researchers in this field. He could not be
reached for comment on his role in the project.
   
Peter St. George, ZeoSync founder and chief executive, said his company's
technique challenged the foundations of digital communications theory
spelled out by ATT's Dr. Claude Shannon in his classic 1948 treatise on
Information Theory.
   
Shannon, who died in 1991, had proposed that any type of information, from
human speech to computer keyboards to storage disks is limited by the
capacity of the data channel over which it flows. This is the basis of
digital communications theory.
   
What we've developed is a new plateau in communications theory, St.
George said. We are expecting to produce the enormous capacity of analog
signaling, with the benefit of the noise-free integrity of digital
communications.
   
The techniques described by ZeoSync would mark a break with the dozens of
existing compression technologies, including MPEG for video and music and
JPEG for pictures and graphics are able to compact data at compression
rates up to 10 times the original. These algorithms typically work by
eliminating long strings of identifiable bits of data such as blue sky,
green grass or the white background of a snow-covered landscape.
   
In a statement, ZeoSync said its techniques, once fully developed, will
offer compression ratios that are anticipated to approach the
hundreds-to-one range.
   
Using mathematical terminology, the company said its technique
intentionally randomizes naturally occurring patterns to form
entropy-like random sequences.
 

Re: Random Data Compressed 100:1 (Guffaw)

2002-01-08 Thread Michael Motyka

Eric Cordian [EMAIL PROTECTED] wrote :
Someone else needs to read the comp.compression FAQ.

http://www.reuters.com/news_article.jhtml?type=technologynewsStoryID=498720

-

NEW YORK (Reuters) - A Florida research start-up working with a team of
renowned mathematicians said on Monday it had achieved a breakthrough that
overcomes the previously known limits of compression used to store and
transmit data.
...
ZeoSync said its scientific team had succeeded on a small scale in
compressing random information sequences in such a way as to allow the
same data to be compressed more than 100 times over -- with no data loss.
That would be at least an order of magnitude beyond current known
algorithms for compacting data.
...
-- 
Eric Michael Cordian 0+

There may be a way to derive  patch together pseudorandom sequence
generators, who knows. If there is anything real about this I wonder how
long it takes to compress large blocks of arbitrary input data?
Geological time scales anyone?

Mike




Re: Random Data Compressed 100:1 (Guffaw)

2002-01-08 Thread georgemw

On 8 Jan 2002, at 9:51, Michael Motyka wrote:

 Eric Cordian [EMAIL PROTECTED] wrote :
 Someone else needs to read the comp.compression FAQ.
 
 http://www.reuters.com/news_article.jhtml?type=technologynewsStoryID=498720
 
 -
 
 NEW YORK (Reuters) - A Florida research start-up working with a team of
 renowned mathematicians said on Monday it had achieved a breakthrough that
 overcomes the previously known limits of compression used to store and
 transmit data.
 ...
 ZeoSync said its scientific team had succeeded on a small scale in
 compressing random information sequences in such a way as to allow the
 same data to be compressed more than 100 times over -- with no data loss.
 That would be at least an order of magnitude beyond current known
 algorithms for compacting data.
 ...
 -- 
 Eric Michael Cordian 0+
 
 There may be a way to derive  patch together pseudorandom sequence
 generators, who knows. If there is anything real about this I wonder how
 long it takes to compress large blocks of arbitrary input data?
 Geological time scales anyone?
 
 Mike
 

A meaningless question.  As Eric correctly points out, true
random input simply cannot be compressed, it doesn't matter
how clever you are or how much time and computing power you
have access to.  Converseley, if you're just intereted in 
compressing  the results of your own pseudo-random number
generator, it's trivial to do it instantly, just store the seed
and the desired file size.

George
 
 





Re: Random Data Compressed 100:1 (Guffaw)

2002-01-08 Thread Eric Cordian

Declan opines:

 I'm naturally skeptical of this claim (until I can verify it for
 myself), but I do not believe the claim is we can encode random data
 at 100:1. 

From the article:

ZeoSync said its scientific team had succeeded on a small scale in
 compressing random information sequences in such a way as to allow the
 same data to be compressed more than 100 times over -- with no data
 loss.

Now of course it's possible they were horribly misquoted.  Still, it is
worrisome that so many people quoted in the article think such algorithmic
gymnastics are mathematically possible.

-- 
Eric Michael Cordian 0+
O:.T:.O:. Mathematical Munitions Division
Do What Thou Wilt Shall Be The Whole Of The Law




Re: Random Data Compressed 100:1 (Guffaw)

2002-01-08 Thread Steve Schear

At 02:27 PM 1/8/2002 -0700, Michael Motyka wrote:
[EMAIL PROTECTED] wrote :
 
 On 8 Jan 2002, at 9:51, Michael Motyka wrote:
 
  Eric Cordian [EMAIL PROTECTED] wrote :
  Someone else needs to read the comp.compression FAQ.



What exactly is random data? Does it have to appear to be random? Does
it have to pass some set of statistical tests to be random? If a string
or bits from a radiation source spells Merry Christmas and a Happy New
Year in ASCII is it non-random - a message from above? If a string of
bits from a natural source ( decay etc ) matches a string of bits from
some PRNG is it somehow disqualified as truly random data?

I think the classification random does not rule out the presence of
local patterns however obscure. If a substring of data happens to match
what is generated by some PRNG then that substring can be compressed
to {generator, seed, count}. The geological timescale might be involved
in matching generator outputs to input data sections or deriving
generators for subsections of data.

I came up with a similar approach in the late '80s (I may have even 
discussed it on the list a year or so back).  I was interested in 
compressing image data, which has quite a bit of inter-pixel 
correlation.  My approach was to run a Hilbert space filling curve through 
the image (it looks like a tightly wound maze).  This allows one to 
maintain most of the pixel correlation while transforming from an array to 
line.  Then I analyzed the auto correlation of the runs of various lengths 
and attempted to create generators which could produce the required 
combinations/permutations and auto correlations to code for the runs.  I 
say attempted, because I was never able to find acceptable algorithms to 
satisfy my requirement.  I still believe these algorithms exist, it was 
just my limitations in identifying the underlying math needed.

steve