Re: Random Data Compressed 100:1 (Guffaw)
Seems our intrepid reporter has given Mr. Saint George, the ZeoSync CEO, his own writeup in Wired News. http://www.wired.com/news/technology/0,1282,49599,00.html Particularly interesting is the following claim... "What hasn't been previously proven, we're proving. We can compress every single permutation of an N-member set. These are going to be the details that we're going to be announcing in a few days." Last I counted, there were N! permutations of an N-member set, and N^N sequences of N elements taken from an N-member set. For big N, and even moderately sized N, N! << N^N. I hope Mr. Saint George hasn't spent the last 12 years of his life discovering that permutations are compressable. -- Eric Michael Cordian 0+ O:.T:.O:. Mathematical Munitions Division "Do What Thou Wilt Shall Be The Whole Of The Law"
Re: Random Data Compressed 100:1 (Guffaw)
On Tuesday, January 8, 2002, at 07:12 PM, F. Marc de Piolenc wrote: > I think they may be referring to a random string of _ASCII characters_. > That would be subject to compression because it is not random at the bit > level. > > But 100:1? I have no idea how to achieve that. The entropy of ordinary written English is about 2.2 bits per character. (Meaning, of course, that each new character contributes not much "surprise," i.e., is somewhat predictable. Entropies of various sources can be experimentally measured.) Others, particularly Eric Cordian, have already written at length on this claim of "100:1 compression of any file." Steve Smale at Berkeley is a famous and respected mathematician. He may be unaware of these claims, he may be senile, he may have worked on some small (no pun intended) aspect as a paid consultant, or the actual compression may be lossy and oriented toward replacing JPEG with a better lossy algorith (as Iterated Systems, IIRC, has been doing with fractal compression). The pigeonhole principle is not just a "theory," it's the _law_. You can't stuff distinct 64 things into 32 boxes without a lot of them being put into the same box. Which means reversing (decompressing) to the original won't work. The Chaitin-Kolmogorov view of algorithmic information theory is the best model we have. By the way, for all of you who struggled to understand Godel's Theorem expressed in the "logic" formulation that Godel first used--and that was the basis of Newman's fairly popular exposition--there are very understandable formulations of Godel's Theorem in terms of information theory and compressibility. Chaitin has a very clean one (no Kleene pun intended). Rudy Rucker does a good job of explaining it in "Mind Tools." Chaitin also has some good books and articles, though some of them jump too quickly into stuff like computing Omega...his "Scientific American"-level articles on the foundations of mathematics are better places to start...included in some of his books. --Tim May "A democracy cannot exist as a permanent form of government. It can only exist until the voters discover that they can vote themselves money from the Public Treasury. From that moment on, the majority always votes for the candidate promising the most benefits from the Public Treasury with the result that a democracy always collapses over loose fiscal policy always followed by dictatorship." --Alexander Fraser Tyler
Re: Random Data Compressed 100:1 (Guffaw)
Ken Brown <[EMAIL PROTECTED]> wrote : > >Michael Motyka wrote: >> Here we go : >> >> "As Eric correctly points out, true random input simply cannot be >> compressed, it doesn't matter how clever you are or how much time and >> computing power you have access to." >> >> This is a statement of belief isn't it? Odd. > >No, it's a logical consequence of his definition of "random" & no more a >statement of belief than "1+1=2" is. If you use the word to mean >something else then it might or might not be true in your universe. > >Ken > Perhaps my universe is off in all six degrees but answer this - which is it : random data cannot be compressed or random data may include recognizable and even repeating patterns A string of random data may include anything. Compressibility will vary with the particular data and the algorithm used. Lucky with one input set, unlucky with another. I wouldn't try to base a business on getting lucky with the customer's input data but the absolute statement of incompressibility is wrong unless you preselect your data sets. Then they're no longer random and the definition of random becomes circular. Mike
Re: Random Data Compressed 100:1 (Guffaw)
At 12:06 PM 1/9/2002 +0100, Eugene Leitl wrote: >On Tue, 8 Jan 2002, Steve Schear wrote: > > > combinations/permutations and auto correlations to code for the runs. I > > say attempted, because I was never able to find acceptable algorithms to > > satisfy my requirement. I still believe these algorithms exist, it was > > just my limitations in identifying the underlying math needed. > >http://www.google.com/search?q=IFS+image+compression&sourceid=opera&num=100&ie=utf-8&oe=utf-8 I knew the guys behind Fractal compression, but that's not what I was trying to do. They looked for a higher-level order I was just looking to take advantage of local correlations and the ability to identify identical runs in the output of deterministic generators who's references could be expressed very efficiently (e.g., correlation coefficient/seed value, displacement) compared with the data being compressed. Its a highly asymmetric approach so real-time applications were not considered. IFS eventually became practical but they came from a time when proprietary solutions were king and missed the open standards revolution which powered JPG (which while generally as good at compression was open to improvement, well studied and not chained by patents). steve
Re: CDR: Re: Random Data Compressed 100:1 (Guffaw)
On Wed, 9 Jan 2002, Ken Brown wrote: > Michael Motyka wrote: > > > > Here we go : > > > > "As Eric correctly points out, true random input simply cannot be > > compressed, it doesn't matter how clever you are or how much time and > > computing power you have access to." > > > > This is a statement of belief isn't it? Odd. > > No, it's a logical consequence of his definition of "random" & no more a > statement of belief than "1+1=2" is. If you use the word to mean > something else then it might or might not be true in your universe. So, it depends on what the definition of "is" is? ;-) -- Yours, J.A. Terranson [EMAIL PROTECTED] If Governments really want us to behave like civilized human beings, they should give serious consideration towards setting a better example: Ruling by force, rather than consensus; the unrestrained application of unjust laws (which the victim-populations were never allowed input on in the first place); the State policy of justice only for the rich and elected; the intentional abuse and occassionally destruction of entire populations merely to distract an already apathetic and numb electorate... This type of demogoguery must surely wipe out the fascist United States as surely as it wiped out the fascist Union of Soviet Socialist Republics. The views expressed here are mine, and NOT those of my employers, associates, or others. Besides, if it *were* the opinion of all of those people, I doubt there would be a problem to bitch about in the first place...
Re: Random Data Compressed 100:1 (Guffaw)
Eric Cordian wrote: > > Declan opines: > > > I'm naturally skeptical of this claim (until I can verify it for > > myself), but I do not believe the claim is "we can encode random data > > at 100:1." > > >From the article: > > "ZeoSync said its scientific team had succeeded on a small scale in > compressing random information sequences in such a way as to allow the > same data to be compressed more than 100 times over -- with no data > loss." > > Now of course it's possible they were horribly misquoted. Still, it is > worrisome that so many people quoted in the article think such algorithmic > gymnastics are mathematically possible. The overpowering stench of snake oil pervades the ether I particularly liked: "The techniques described by ZeoSync would mark a break with the dozens of existing compression technologies, including MPEG for video and music and JPEG for pictures and graphics are able to compact data at compression rates up to 10 times the original. These algorithms typically work by eliminating long strings of identifiable bits of data such as blue sky, green grass or the white background of a snow-covered landscape." Which sounds like someone who doesn't know what they are talking about being misreported by someone who doesn't understand (*) "ZeoSync said its scientific team had succeeded on a small scale" "The company's claims, which are yet to be demonstrated in any public forum" "ZeoSync, whose Web site can be located at http://www.zeosync.com/"; "Among the scientific team working with ZeoSync is Steve Smale, one of America's most renowned mathematicians. Smale is an emeritus professor at the University of California at Berkeley and the 1966 winner of the Fields Prize, the Nobel Prize for researchers in this field. He could not be reached for comment on his role in the project." I bet. Ken Brown (*) clue for those who have too much of a life to bother with things like file structures and encoding (though why would they be reading cypherpunks?) - JPEG is a lossy compression method that tries to recode "real world" pictures in a way that *looks* almost as good as the real thing but takes up less space, by smoothing out gradual changes of colour (and other stuff as well). It doesn't "typically work by eliminating long strings of identifiable bits of data". And it doesn't compress "up to 10 times", it compresses as much as you like, with a trade-off between file size and image quality. MPEG, AFAIK, is similar.
Re: CDR: Re: Random Data Compressed 100:1 (Guffaw)
I think they may be referring to a random string of _ASCII characters_. That would be subject to compression because it is not random at the bit level. But 100:1? I have no idea how to achieve that. Marc de Piolenc Declan McCullagh wrote: > > What exactly is random data? Does it have to appear to be random? Does > > it have to pass some set of statistical tests to be random? If a string > > I'm naturally skeptical of this claim (until I can verify it for > myself), but I do not believe the claim is "we can encode random data > at 100:1." They seem to be talking about 100:1 lossless compression > of real-world data, which is generally not random. > > -DEclan -- Remember September 11, 2001 but don't forget July 4, 1776 Rather than make war on the American people and their liberties, ...Congress should be looking for ways to empower them to protect themselves when warranted. They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. - Benjamin Franklin
Re: Random Data Compressed 100:1 (Guffaw)
Michael Motyka wrote: > Here we go : > > "As Eric correctly points out, true random input simply cannot be > compressed, it doesn't matter how clever you are or how much time and > computing power you have access to." > > This is a statement of belief isn't it? Odd. No, it's a logical consequence of his definition of "random" & no more a statement of belief than "1+1=2" is. If you use the word to mean something else then it might or might not be true in your universe. Ken
Re: Random Data Compressed 100:1 (Guffaw)
On Tue, 8 Jan 2002, Steve Schear wrote: > combinations/permutations and auto correlations to code for the runs. I > say attempted, because I was never able to find acceptable algorithms to > satisfy my requirement. I still believe these algorithms exist, it was > just my limitations in identifying the underlying math needed. http://www.google.com/search?q=IFS+image+compression&sourceid=opera&num=100&ie=utf-8&oe=utf-8
Re: Random Data Compressed 100:1 (Guffaw)
At 02:27 PM 1/8/2002 -0700, Michael Motyka wrote: >[EMAIL PROTECTED] wrote : > > > >On 8 Jan 2002, at 9:51, Michael Motyka wrote: > > > >> Eric Cordian <[EMAIL PROTECTED]> wrote : > >> >Someone else needs to read the comp.compression FAQ. >What exactly is random data? Does it have to appear to be random? Does >it have to pass some set of statistical tests to be random? If a string >or bits from a radiation source spells "Merry Christmas and a Happy New >Year" in ASCII is it non-random - a message from above? If a string of >bits from a natural source ( decay etc ) matches a string of bits from >some PRNG is it somehow disqualified as truly random data? > >I think the classification random does not rule out the presence of >local patterns however obscure. If a substring of data happens to match >what is generated by some PRNG then that substring can be "compressed" >to {generator, seed, count}. The geological timescale might be involved >in matching generator outputs to input data sections or deriving >generators for subsections of data. I came up with a similar approach in the late '80s (I may have even discussed it on the list a year or so back). I was interested in compressing image data, which has quite a bit of inter-pixel correlation. My approach was to run a Hilbert space filling curve through the image (it looks like a tightly wound maze). This allows one to maintain most of the pixel correlation while transforming from an array to line. Then I analyzed the auto correlation of the runs of various lengths and attempted to create generators which could produce the required combinations/permutations and auto correlations to code for the runs. I say attempted, because I was never able to find acceptable algorithms to satisfy my requirement. I still believe these algorithms exist, it was just my limitations in identifying the underlying math needed. steve
Re: Random Data Compressed 100:1 (Guffaw)
Declan opines: > I'm naturally skeptical of this claim (until I can verify it for > myself), but I do not believe the claim is "we can encode random data > at 100:1." >From the article: "ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss." Now of course it's possible they were horribly misquoted. Still, it is worrisome that so many people quoted in the article think such algorithmic gymnastics are mathematically possible. -- Eric Michael Cordian 0+ O:.T:.O:. Mathematical Munitions Division "Do What Thou Wilt Shall Be The Whole Of The Law"
Re: Random Data Compressed 100:1 (Guffaw)
On 8 Jan 2002, at 9:51, Michael Motyka wrote: > Eric Cordian <[EMAIL PROTECTED]> wrote : > >Someone else needs to read the comp.compression FAQ. > > > >http://www.reuters.com/news_article.jhtml?type=technologynews&StoryID=498720 > > > >- > > > >NEW YORK (Reuters) - A Florida research start-up working with a team of > >renowned mathematicians said on Monday it had achieved a breakthrough that > >overcomes the previously known limits of compression used to store and > >transmit data. > >... > >ZeoSync said its scientific team had succeeded on a small scale in > >compressing random information sequences in such a way as to allow the > >same data to be compressed more than 100 times over -- with no data loss. > >That would be at least an order of magnitude beyond current known > >algorithms for compacting data. > >... > >-- > >Eric Michael Cordian 0+ > > > There may be a way to derive & patch together pseudorandom sequence > generators, who knows. If there is anything real about this I wonder how > long it takes to compress large blocks of arbitrary input data? > Geological time scales anyone? > > Mike > A meaningless question. As Eric correctly points out, true random input simply cannot be compressed, it doesn't matter how clever you are or how much time and computing power you have access to. Converseley, if you're just intereted in "compressing" the results of your own pseudo-random number generator, it's trivial to do it instantly, just store the seed and the desired file size. George >
Re: Random Data Compressed 100:1 (Guffaw)
Eric Cordian <[EMAIL PROTECTED]> wrote : >Someone else needs to read the comp.compression FAQ. > >http://www.reuters.com/news_article.jhtml?type=technologynews&StoryID=498720 > >- > >NEW YORK (Reuters) - A Florida research start-up working with a team of >renowned mathematicians said on Monday it had achieved a breakthrough that >overcomes the previously known limits of compression used to store and >transmit data. >... >ZeoSync said its scientific team had succeeded on a small scale in >compressing random information sequences in such a way as to allow the >same data to be compressed more than 100 times over -- with no data loss. >That would be at least an order of magnitude beyond current known >algorithms for compacting data. >... >-- >Eric Michael Cordian 0+ > There may be a way to derive & patch together pseudorandom sequence generators, who knows. If there is anything real about this I wonder how long it takes to compress large blocks of arbitrary input data? Geological time scales anyone? Mike
Random Data Compressed 100:1 (Guffaw)
Someone else needs to read the comp.compression FAQ. http://www.reuters.com/news_article.jhtml?type=technologynews&StoryID=498720 - NEW YORK (Reuters) - A Florida research start-up working with a team of renowned mathematicians said on Monday it had achieved a breakthrough that overcomes the previously known limits of compression used to store and transmit data. If proven and successfully commercialized, the discovery asserted by ZeoSync Corp of West Palm Beach, Florida could overturn half a century of thinking in the field of lossless data compression and undermine business assumptions on which the telecommunications and other digital industries are based. Lossless compression is the process by which computer data can be compacted, stored and then restored with complete fidelity, or zero loss of data. Existing compression techniques typically shed redundant data in order to conserve space. ZeoSync said its scientific team had succeeded on a small scale in compressing random information sequences in such a way as to allow the same data to be compressed more than 100 times over -- with no data loss. That would be at least an order of magnitude beyond current known algorithms for compacting data. [Someone needs to learn the pigeonhole principle. You can not give each member of a set of 2^(N*100) things its own unique identifier taken from a set of 2^N things.] The company's claims, which are yet to be demonstrated in any public forum, could vastly boost the ability of computer disks to store, text, music and video -- if ZeoSync's formulae succeed in scaling up to handle massive amounts of data. The same compression technique might one day make make high-speed Internet access cheaper and widely available across the globe, posing a threat to huge investments in telecommunications network capacity, an industry analyst said. "Either this research is the next 'Cold Fusion' scam that dies away or it's the foundation for a Nobel Prize. I don't have an answer to which one it is yet," said David Hill, a data storage analyst with Boston-based Aberdeen Group. [David needs to read the comp.compression FAQ too.] In 1990, a group of Utah researchers scandalized the scientific world with claims -- quickly found to be unsupported -- that the long-sought answer to the problem of Cold Fusion had been discovered. Hill, the only independent analyst briefed ahead of the announcement, said ZeoSync's claims were theoretically feasible, but years away from definitive proof. He will require more information before evaluating ZeoSync's claims, he said. [You know, if you run the random data through the algorithm a second time, I bet you would get 10,000:1 compression.] ZeoSync, whose Web site can be located at http://www.zeosync.com/, was founded by Peter St. George, an Internet and financial services entrepreneur, who has a background in telecommunications research. ZeoSync, with 30 full-time employees, said it had collaborated with experts from Harvard University, Massachusetts Institute of Technology, Stanford University and researchers in Warsaw, Moscow, and Beijing. BREAK WITH CURRENT TECHNOLOGIES Among the scientific team working with ZeoSync is Steve Smale, one of America's most renowned mathematicians. Smale is an emeritus professor at the University of California at Berkeley and the 1966 winner of the Fields Prize, the Nobel Prize for researchers in this field. He could not be reached for comment on his role in the project. Peter St. George, ZeoSync founder and chief executive, said his company's technique challenged the foundations of digital communications theory spelled out by AT&T's Dr. Claude Shannon in his classic 1948 treatise on Information Theory. Shannon, who died in 1991, had proposed that any type of information, from human speech to computer keyboards to storage disks is limited by the capacity of the data channel over which it flows. This is the basis of digital communications theory. "What we've developed is a new plateau in communications theory," St. George said. "We are expecting to produce the enormous capacity of analog signaling, with the benefit of the noise-free integrity of digital communications." The techniques described by ZeoSync would mark a break with the dozens of existing compression technologies, including MPEG for video and music and JPEG for pictures and graphics are able to compact data at compression rates up to 10 times the original. These algorithms typically work by eliminating long strings of identifiable bits of data such as blue sky, green grass or the white background of a snow-covered landscape. In a statement, ZeoSync said its techniques, "once fully developed, will offer compression ratios that are anticipated to approach the hundreds-to-one range." Using mathematical terminology, the company said its technique "intentionally randomizes naturally occurring patterns to form entropy-like random s