Re: Compression of random binary data
On 4/11/18 9:29 PM, cuddlycave...@gmail.com wrote: I’m replying to your post on January 28th Nice carefully chosen non random numbers Steven D'Aprano. Was just doing what you asked, but you don’t remember 😂😂😂 Best practice is to include a quote of the thing you are replying to. It makes it much easier for people to follow the thread of the discussion, especially when there are large gaps in the timeline. --Ned. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
I’m replying to your post on January 28th Nice carefully chosen non random numbers Steven D'Aprano. Was just doing what you asked, but you don’t remember 😂😂😂 -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 10 Apr 2018 23:36:27 -0700, cuddlycaveman wrote: [snip a number of carefully chosen, non-random numbers shown in binary] > Don’t know if that helps Helps what? With no context, we don't know who you are replying to, what they asked, or why you think this is helpful. According to my archives, such as they are, you appear to be responding to a post made in either July 2016 or October 2017. -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
387420479 00110011 00111000 00110111 00110100 00110010 0011 00110100 00110111 00111001 72 bits Equal to (9^9)-10 00101000 00111001 0100 00111001 00101001 00101101 00110001 0011 64 bits 387420499 00110011 00111000 00110111 00110100 00110010 0011 00110100 00111001 00111001 72 bits Equal to (9^9)+10 00101000 00111001 0100 00111001 00101001 00101011 00110001 0011 64 bits Or 387,420,489 387,420,499 00110011 00111000 00110111 00101100 00110100 00110010 0011 00101100 00110100 00111000 00111001 0010 00110011 00111000 00110111 00101100 00110100 00110010 0011 00101100 00110100 00111001 00111001 184 bits Equal to ((9^9)-10) ((9^9)+10) 00101000 00101000 00111001 0100 00111001 00101001 00101101 00110001 0011 00101001 0010 00101000 00101000 00111001 0100 00111001 00101001 00101011 00110001 0011 00101001 168 bits Or 387420479 387420499 00110011 00111000 00110111 00110100 00110010 0011 00110100 00110111 00111001 0001010 00110011 00111000 00110111 00110100 00110010 0011 00110100 00111001 00111001 152 bits Equal to (9^9)-10 (9^9)+10 00101000 00111001 0100 00111001 00101001 00101101 00110001 0011 0001010 00101000 00111001 0100 00111001 00101001 00101011 00110001 0011 136 bits Don’t know if that helps -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sat, 27 Jan 2018 21:26:06 -0800 (PST), pendrysamm...@gmail.com wrote: > If it is then show him this > > 387,420,489 >= > 00110011 00111000 00110111 00101100 00110100 00110010 0011 0 ... To save the casual reader a moment of disorientation, the above binary string is just the ASCII representation of the text string "387,420,489". > 9^9 = ⬇️ (^ = to the power of) >= 387,420,489 > > But > > 9^9 >= > 00111001 0100 00111001 Similarly, this is the ASCII representation of "9^9". Our self-confessedly intermittently sober correspondent appears to have discovered the backside of the principle that a short text expression can generate a large number. -- To email me, substitute nowhere->runbox, invalid->com. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data (Posting On Python-List Prohibited)
On 2018-01-28, pendrysamm...@gmail.com wrote: > I have it in my head, just need someone to write the program for me, > I know nothing about data compression or binary data other than 1s > and 0s and that you can not take 2 number without a possible value > more or less than them selves and compress them, I have been working > for 1 1/2 years on a solution, just need help with programming. > > If someone could get ahold of me when I am sober I could probably > explain it a lot better, but I said fuck it I’ll show everyone a > possibility instead of trying to do it on my own. Wow... Just... wow. -- Grant -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sat, 27 Jan 2018 21:50:24 -0800, pendrysammuel wrote: > 387,420,489 is a number with only 2 repeating binary sequences Okay. Now try these two numbers: 387420479 387420499 -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sat, 27 Jan 2018 22:14:46 -0800, pendrysammuel wrote: > I have it in my head, just need someone to write the program for me, Sure, my rate is $150 an hour. > I > know nothing about data compression or binary data other than 1s and 0s > and that you can not take 2 number without a possible value more or less > than them selves and compress them, I have been working for 1 1/2 years > on a solution, just need help with programming. Sounds like you should have spend two hours on learning a bit about data compression (start with Wikipedia) instead of wasting 1.5 years trying to guess a solution for something you know nothing about. > If someone could get ahold of me when I am sober O_o Make that $550 an hour. Payable in advance. -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data (Posting On Python-List Prohibited)
Lawrence D’Oliveiro In other words yes, I just need to be sober first. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data (Posting On Python-List Prohibited)
I have it in my head, just need someone to write the program for me, I know nothing about data compression or binary data other than 1s and 0s and that you can not take 2 number without a possible value more or less than them selves and compress them, I have been working for 1 1/2 years on a solution, just need help with programming. If someone could get ahold of me when I am sober I could probably explain it a lot better, but I said fuck it I’ll show everyone a possibility instead of trying to do it on my own. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
387,420,489 is a number with only 2 repeating binary sequences In binary 387,420,489 is expressed as 00110011 00111000 00110111 00101100 00110100 00110010 0011 00101100 00110100 00111000 00111001 387,420,489 can be simplified to 9*9 or nine to the power of nine In binary 9*9 is represented by 00111001 00101010 00111001 9*9 and 387,420,489 are the same thing they are both a representation of numbers or data, yet in binary 9*9 is 64 bits shorter than 387,420,489 Not a solution just help. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, Jan 28, 2018 at 4:26 PM, wrote: > If it is then show him this > > 387,420,489 > = > 00110011 00111000 00110111 00101100 00110100 00110010 0011 00101100 > 00110100 00111000 00111001 > > 9^9 = ⬇️ (^ = to the power of) > = 387,420,489 > > But > > 9^9 > = > 00111001 0100 00111001 I have no idea what you're responding to or how this has anything to do with the thread you're replying to, but I would advise you to look into the difference between exponentiation and exclusive-or. Even so, I don't know what result you got. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Compression of random binary data
If it is then show him this 387,420,489 = 00110011 00111000 00110111 00101100 00110100 00110010 0011 00101100 00110100 00111000 00111001 9^9 = ⬇️ (^ = to the power of) = 387,420,489 But 9^9 = 00111001 0100 00111001 -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Gregory Ewing writes: > Ben Bacarisse wrote: >> But that has to be about the process that gives rise to the data, not >> the data themselves. > >> If I say: "here is some random data..." you can't tell if it is or is >> not from a random source. I can, as a parlour trick, compress and >> recover this "random data" because I chose it. > > Indeed. Another way to say it is that you can't conclude > anything about the source from a sample size of one. > > If you have a large enough sample, then you can estimate > a probability distribution, and calculate an entropy. > >> I think the argument that you can't compress arbitrary data is simpler >> ... it's obvious that it includes the results of previous >> compressions. > > What? I don't see how "results of previous compressions" comes > into it. The source has an entropy even if you're not doing > compression at all. Maybe we are taking at cross purposes. A claim to be able to compress arbitrary data leads immediately to the problem that iterating the compression will yield zero-size results. That, to me, is a simpler argument that talking about data from a random source. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, 29 Oct 2017 01:56 pm, Stefan Ram wrote: > If the entropy of an individual message is not defined, > than it is still available to be defined. I define it > to be log2(1/p), where p is the probability of this > message. I also choose a unit for it, which I call "bit". That is exactly the definition of self-information: https://en.wikipedia.org/wiki/Self-information See also: https://en.wikipedia.org/wiki/Entropy_(information_theory) which lists several forms of related measures of information: - the self-information of an individual message or symbol taken from a given probability distribution; - the entropy of a given probability distribution of messages or symbols; - the entropy rate of a stochastic process. It also suggests a connection between information entropy and thermodynamic entropy, namely that the information entropy of a system is the amount of additional information needed to determine the microstate of a system (the states of all its particles), given the macrostate (identified by the bulk thermodynamic parameters such as temperature, volume, energy). More here: https://physics.stackexchange.com/questions/263197/is-information-entropy-the-same-as-thermodynamic-entropy -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, 29 Oct 2017 06:03 pm, Chris Angelico wrote: > On Sun, Oct 29, 2017 at 6:00 PM, Ian Kelly wrote: >> On Oct 28, 2017 5:53 PM, "Chris Angelico" wrote: >>> One bit. It might send the message, or it might NOT send the message. >> >> Not sending the message is equivalent to having a second possible message. > > Okay, now we're getting seriously existential. Is a non-message a message? Is zero a number? Is bald a hair colour? Is atheism a religion? Aristotle seriously argued that the smallest whole number was two. Zero was clearly nothing at all, and one wasn't a *number*, it was *unity*. A number is, by definition, a multitude of units. https://philosophy.stackexchange.com/questions/19533/why-does-aristotle-suggest-one-is-not-a-number http://classics.mit.edu/Aristotle/metaphysics.14.xiv.html No message can be a message. British nuclear submarines during the Cold War had orders that if they failed to receive any transmissions from London within a certain time period, they were to crack open the secret orders from the Prime Minister. Nobody knows what is in the orders, as they were destroyed when the PM left office, but they are popularly supposed to have included: - fire your missiles at Russia; - surrender to whoever won the war; - try to flee to Australia or New Zealand and join up with whatever remnants of the British Empire still exist; - do whatever you want. It makes a good, but ultimately futile, exercise to try to guess what the various PMs would have said. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, 29 Oct 2017 02:31 pm, Gregory Ewing wrote: > Steve D'Aprano wrote: >> I don't think that's right. The entropy of a single message is a >> well-defined quantity, formally called the self-information. >> >> https://en.wikipedia.org/wiki/Self-information > > True, but it still depends on knowing (or assuming) the > probability of getting that particular message out of > the set of all possible messages. Indeed. > This is *not* what danceswithnumbers did when he > calculated the "entropy" of his example bit sequences. > He didn't define the set they were drawn from or > what their probabilities were. I'm not defending or supporting Danceswithnumbers in his confusion. I'm just pointing out that an entropy-like measure of information for individual messages does exist, and people do often call it "entropy". -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Chris Angelico wrote: One bit. It might send the message, or it might NOT send the message. The entropy formula assumes that you are definitely going to send one of the possible messages. If not sending a message is a possibility, then you need to include an empty message in the set of messages. Another way to think about it: The receiver can be in one of N possible states, and you want to ensure that it's in a particular state. To do that, there must be N possible messages that you can send. If N = 1, the receiver is already in the right state, so you don't need to send it a message at all. Example: If the message is "I love you", and your SO already knows that, you don't need to tell them. (Not according to information theory, at least.) -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, Oct 29, 2017 at 6:00 PM, Ian Kelly wrote: > On Oct 28, 2017 5:53 PM, "Chris Angelico" wrote: >> One bit. It might send the message, or it might NOT send the message. > > Not sending the message is equivalent to having a second possible message. Okay, now we're getting seriously existential. Is a non-message a message? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Oct 28, 2017 5:53 PM, "Chris Angelico" wrote: > One bit. It might send the message, or it might NOT send the message. Not sending the message is equivalent to having a second possible message. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, Oct 29, 2017 at 2:08 PM, Gregory Ewing wrote: > Stefan Ram wrote: >> >> Well, then one can ask about the entropy of a data source >> that only is emitting this message. > > > You can, but it's still the *source* that has the entropy, > not the message. > > (And the answer in that case is that the entropy is zero. > If there's only one possible message you can ever send, you > don't need to send it at all!) One bit. It might send the message, or it might NOT send the message. And I have known situations in which that is exactly the system used. Back before mobile phones were common, you could sometimes use a payphone to cause someone's phone to ring, but you couldn't actually speak on it. So you had one message you could send: "bing bring". A pre-arranged meaning for that message might be "I'm at the railway station, please come and pick me up"... but there's still *some* information in the mere fact of the call. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano wrote: I don't think that's right. The entropy of a single message is a well-defined quantity, formally called the self-information. https://en.wikipedia.org/wiki/Self-information True, but it still depends on knowing (or assuming) the probability of getting that particular message out of the set of all possible messages. This is *not* what danceswithnumbers did when he calculated the "entropy" of his example bit sequences. He didn't define the set they were drawn from or what their probabilities were. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Stefan Ram wrote: Well, then one can ask about the entropy of a data source that only is emitting this message. You can, but it's still the *source* that has the entropy, not the message. (And the answer in that case is that the entropy is zero. If there's only one possible message you can ever send, you don't need to send it at all!) -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, Oct 29, 2017 at 1:32 PM, Chris Angelico wrote: > On Sun, Oct 29, 2017 at 1:18 PM, Gregory Ewing > wrote: >> You're missing something fundamental about what >> entropy is in information theory. >> >> It's meaningless to talk about the entropy of a single >> message. Entropy is a function of the probability >> distribution of *all* the messages you might want to >> send. > > Which is where a lot of "password strength" confusion comes from. How > much entropy is there in the password "U3ZINVp3PT0="? Strong password > or weak? What about "dark-next-sure-secret"? > "with-about-want-really-going"? They were generated by, respectively: > double-MIME-encoding four bytes from /dev/random (32 bits of entropy), > picking four words from the top 1024 (40 bits), and picking 5 words > from the top 64 (30 bits). But just by looking at the passwords > themselves, you can't tell that. To clarify: The "top 1024 words" concept is XKCD 936 style password generation, using my tabletop gaming room's resident parrot. So it's based on the frequency of words used by D&D players. YMMV if you use a different corpus :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, Oct 29, 2017 at 1:18 PM, Gregory Ewing wrote: > You're missing something fundamental about what > entropy is in information theory. > > It's meaningless to talk about the entropy of a single > message. Entropy is a function of the probability > distribution of *all* the messages you might want to > send. Which is where a lot of "password strength" confusion comes from. How much entropy is there in the password "U3ZINVp3PT0="? Strong password or weak? What about "dark-next-sure-secret"? "with-about-want-really-going"? They were generated by, respectively: double-MIME-encoding four bytes from /dev/random (32 bits of entropy), picking four words from the top 1024 (40 bits), and picking 5 words from the top 64 (30 bits). But just by looking at the passwords themselves, you can't tell that. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano wrote: Random data = any set of data generated by "a source of random". Any set of data generated by Grant Thompson? https://www.youtube.com/user/01032010814 -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
danceswithnumb...@gmail.com wrote: 10101011 This equals 61611 This can be represented using 0-6 log2(7)*5= 14.0367746103 bits 11010101 This equals 54543 This can be represented using 0-5 log2(6)*5= 12.9248125036 bits You're missing something fundamental about what entropy is in information theory. It's meaningless to talk about the entropy of a single message. Entropy is a function of the probability distribution of *all* the messages you might want to send. What you calculated in your first example relates to this situation: You want to send someone messages consisting of 5 symbols drawn from the set {0, 1, 2, 3, 4, 5, 6}, where all such messages have equal probability. In that case, you need and average of about 14.03 bits for each message. Note that this has essentially nothing to do with the particular sequence of bits you started with. Your second calculation was for a similar situation, except that the set of symbols is just {0, 1, 2, 3, 4, 5}. There are fewer messages of length 5 that can be constructed from that set, so the number of bits needed is smaller. In reality you can express 54543 with 10 bits. Again, this statement is meaningless. You can't say *anything* about the number of bits needed to represent that particular number, without knowing what *other* numbers you might want to represent. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Ben Bacarisse wrote: But that has to be about the process that gives rise to the data, not the data themselves. If I say: "here is some random data..." you can't tell if it is or is not from a random source. I can, as a parlour trick, compress and recover this "random data" because I chose it. Indeed. Another way to say it is that you can't conclude anything about the source from a sample size of one. If you have a large enough sample, then you can estimate a probability distribution, and calculate an entropy. I think the argument that you can't compress arbitrary data is simpler ... it's obvious that it includes the results of previous compressions. What? I don't see how "results of previous compressions" comes into it. The source has an entropy even if you're not doing compression at all. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Oct 28, 2017 10:30 AM, "Stefan Ram" wrote: > Well, then one can ask about the entropy of a data source > thatt only is emitting this message. (If it needs to be endless: > thatt only is emitting this message repeatedly.) If there is only one possible message then the entropy is zero. -1.0 * log2(1.0) == 0 -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Sun, 29 Oct 2017 07:03 am, Peter Pearson wrote: > On Thu, 26 Oct 2017 19:26:11 -0600, Ian Kelly wrote: >> >> . . . Shannon entropy is correctly calculated for a data source, >> not an individual message . . . > > Thank you; I was about to make the same observation. When > people talk about the entropy of a particular message, you > can bet they're headed for confusion. I don't think that's right. The entropy of a single message is a well-defined quantity, formally called the self-information. The entropy of a data source is the expected value of the self-information of all possible messages coming from that data source. https://en.wikipedia.org/wiki/Self-information We can consider the entropy of a data source as analogous to the population mean, and the entropy of a single message as the sample mean. A single message is like a sample taken from the set of all possible messages. Self-information is sometimes also called "surprisal" as it is a measure of how surprising an event is. If your data source is a coin toss, then actually receiving a Heads has a self-information ("entropy") of 1 bit. That's not very surprising. If your data source is a pair of fair, independent dice, then the self-information of receiving a two and a four is 5.170 bits. Its a logarithmic scale, not linear: if the probability of a message or event is p, the self-information of that event is log_2 (1/p) bits. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Thu, 26 Oct 2017 19:26:11 -0600, Ian Kelly wrote: > > . . . Shannon entropy is correctly calculated for a data source, > not an individual message . . . Thank you; I was about to make the same observation. When people talk about the entropy of a particular message, you can bet they're headed for confusion. -- To email me, substitute nowhere->runbox, invalid->com. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano writes: > On Fri, 27 Oct 2017 09:53 am, Ben Bacarisse wrote: > >> A source of random can be defined but "random data" is much more >> illusive. > > Random data = any set of data generated by "a source of random". (I had an editing error there; it should be "a source of random data".) Yes, that's a fine definition, but it has the disadvantage of not being a verifiable property of the thing defined -- you can't know, from the data themselves, if they constitute random data. You would not care about a compression program that worked on some data that looks random, you'd want to present your own data for compression (and then you can use a random source with confidence because the data are yours). That's the big win (for me) of talking about "arbitrary data". -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Fri, 27 Oct 2017 09:53 am, Ben Bacarisse wrote: > A source of random can be defined but "random data" is much more > illusive. Random data = any set of data generated by "a source of random". -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Thu, Oct 26, 2017 at 8:48 PM, wrote: > Shouldn't that be? > > py> 16 * (-7/16 * math.log2(7/16) - 6/16 * math.log2(6/16)) = No, that's failing to account for 3/16 of the probability space. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Thu, Oct 26, 2017 at 8:19 PM, wrote: > It looks like that averages my two examples. I don't know how you can look at two numbers and then look at a third number that is larger than both of them and conclude it is the average. > H by the way that equation is really coolwhy does it return a high > bit count when compared to >>>dec to bin? I don't understand what you're asking. The binary representation of either of those number is 16 bits, which is larger than my 15.8. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Marko Rauhamaa writes: > Ben Bacarisse : > >>> In this context, "random data" really means "uniformly distributed >>> data", i.e. any bit sequence is equally likely to be presented as >>> input. *That's* what information theory says can't be compressed. >> >> But that has to be about the process that gives rise to the data, not >> the data themselves. No finite collection of bits has the property you >> describe. > > Correct. Randomness is meaningful only in reference to future events. > Once the events take place, they cease to be random. > > A finite, randomly generated collection of bits can be tested against a > probabilistic hypothesis using statistical methods. But beware of parlour tricks. You can't reliably test for random looking data that are, in fact, carefully crafted. If the claim is about compressing arbitrary data, then the claimant won't mind testing inputs chosen by me! The only reason this matters is that such people usually won't reveal the algorithm. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Ben Bacarisse : >> In this context, "random data" really means "uniformly distributed >> data", i.e. any bit sequence is equally likely to be presented as >> input. *That's* what information theory says can't be compressed. > > But that has to be about the process that gives rise to the data, not > the data themselves. No finite collection of bits has the property you > describe. Correct. Randomness is meaningful only in reference to future events. Once the events take place, they cease to be random. A finite, randomly generated collection of bits can be tested against a probabilistic hypothesis using statistical methods. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Shouldn't that be? py> 16 * (-7/16 * math.log2(7/16) - 6/16 * math.log2(6/16)) = -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
It looks like that averages my two examples. H by the way that equation is really coolwhy does it return a high bit count when compared to >>>dec to bin? -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Thu, Oct 26, 2017 at 2:38 PM, wrote: > > Thomas Jollans > > On 2017-10-25 23:22, danceswi...@gmail.com wrote: >> With every transform the entropy changes, > > That's only true if the "transform" loses or adds information. > > If it loses information, that's lossy compression, which is only useful > in very specific (but also extremely common) circumstances. > > If it adds information, that's poetry, not compression. > > > > Not true! You can transform a stream lossless, and change its entropy without > adding to or taking away. These two streams 16 bits are the same but one is > reversed. > > 10101011 > This equals > 61611 > This can be represented using > 0-6 log2(7)*5= 14.0367746103 bits > > > 11010101 > This equals > 54543 > This can be represented using > 0-5 log2(6)*5= 12.9248125036 bits > > In reality you can express 54543 with 10 bits. I don't know what you're calculating here but it isn't Shannon entropy. Shannon entropy is correctly calculated for a data source, not an individual message, but if we assume the two numbers above to be the result of a Bernoulli process with probabilities matching the frequencies of the bits in the numbers, then the total entropy of 16 events is: py> 16 * (-9/16 * math.log2(9/16) - 7/16 * math.log2(7/16)) 15.81919053261596 Approximately 15.8 bits. This is the same no matter what order the events occur in. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Gregory Ewing writes: > Ben Bacarisse wrote: >> The trouble is a pedagogic one. Saying "you can't compress random data" >> inevitably leads (though, again, this is just my experience) to endless >> attempts to define random data. > > It's more about using terms without making sure everyone agrees > on the definitions being used. > > In this context, "random data" really means "uniformly distributed > data", i.e. any bit sequence is equally likely to be presented as > input. *That's* what information theory says can't be compressed. But that has to be about the process that gives rise to the data, not the data themselves. No finite collection of bits has the property you describe. If I say: "here is some random data..." you can't tell if it is or is not from a random source. I can, as a parlour trick, compress and recover this "random data" because I chose it. A source of random can be defined but "random data" is much more illusive. >> I think "arbitrary data" (thereby including the results of compression >> by said algorithm) is the best way to make progress. > > I'm not sure that's much better, because it doesn't home in > on the most important thing, which is the probability > distribution. I think the argument that you can't compress arbitrary data is simpler to make. You don't have to define it (except in the very simplest terms) and it's obvious that it includes the results of previous compressions. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Thomas Jollans On 2017-10-25 23:22, danceswi...@gmail.com wrote: > With every transform the entropy changes, That's only true if the "transform" loses or adds information. If it loses information, that's lossy compression, which is only useful in very specific (but also extremely common) circumstances. If it adds information, that's poetry, not compression. Not true! You can transform a stream lossless, and change its entropy without adding to or taking away. These two streams 16 bits are the same but one is reversed. 10101011 This equals 61611 This can be represented using 0-6 log2(7)*5= 14.0367746103 bits 11010101 This equals 54543 This can be represented using 0-5 log2(6)*5= 12.9248125036 bits In reality you can express 54543 with 10 bits. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-24 22:30, Steve D'Aprano wrote: > On Wed, 25 Oct 2017 07:09 am, Peter J. Holzer wrote: > >> On 2017-10-23 04:21, Steve D'Aprano wrote: >>> On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: >>> If the probability of certain codes (either single codes, or sequences of >>> codes) are non-equal, then you can take advantage of that by encoding the >>> common cases into a short representation, and the uncommon and rare cases >>> into a longer representation. As you say: >>> >>> Otherwise, if ( 0, 0 ) is much more frequent, we can encode ( 0, 0 ) by "0" and ( 0, 1 ) by "101", ( 1, 0 ) by "110", and ( 1, 1 ) by "111". And we could then use /less/ than two bits on the average. >>> >>> That's incorrect. On average you use 2.5 bits. >>> >>> (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.) >> >> I disagree. If the distribution is not equal, then the average needs to >> take the different probabilities into account. > > I think I would call that the *weighted* average rather than the average. If there are 4 guys who are 180 cm tall and one who is 2 metres tall, would you say their average height is 184 cm ((180 * 4 + 200 * 1)/(4 + 1)) or 190 cm ((180 + 200)/2)? For me that's the same situation. > Regardless of what we call it, of course both you and Stefan are right in how > to calculate it, and such a variable-length scheme can be used to compress > the data. > > E.g. given the sequence 0011 which would take 8 bits in the obvious > encoding, we can encode it as "000111" which takes only 6 bits. > > But the cost of this encoding scheme is that *some* bit sequences expand, e.g. > the 8 bit sequence 1100 is encoded as "10" which requires 10 > bits. Right. This is most obvious in Huffman encoding, where each symbol is replaced by a sequence of bits which is directly related to the frequency of that symbol. So the letter 'e' might be encoded in 3 or 4 bits (in a corpus of text I happen to have lying around, about 1 in 11 characters is an e and ld(11) = 3.43) while the tilde (about 1 character in 21000) would be encoded in 14 or 15 bits. That's basically how all lossless compression algorithms work: Replacing more frequent bit sequences with shorter sequences and replacing less frequent sequences with longer ones. (Although most operate on variable length byte sequences not individual bytes. I find the LZW algorithm (as used in Unix compress and GIF images) a particularly instructive example). > The end result is that averaged over all possible bit sequences (of a certain > size), this encoding scheme requires MORE space than the obvious 0/1 bits. > > But in practice we don't care much, because the data sequences we care about > are usually not "all possible bit sequences", but a heavily restricted subset > where there are lots of 00 pairs and fewer 01, 10, and 11 pairs. Right. hp -- _ | Peter J. Holzer| Fluch der elektronischen Textverarbeitung: |_|_) || Man feilt solange an seinen Text um, bis | | | h...@hjp.at | die Satzbestandteile des Satzes nicht mehr __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-25 23:22, danceswithnumb...@gmail.com wrote: > With every transform the entropy changes, That's only true if the "transform" loses or adds information. If it loses information, that's lossy compression, which is only useful in very specific (but also extremely common) circumstances. If it adds information, that's poetry, not compression. -- Thomas Jollans -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Thu, 26 Oct 2017 08:22 am, danceswithnumb...@gmail.com wrote: > with each pass you can compress untill the entropy is so random it can no > longer be comressed. Which is another way of saying that you cannot compress random binary data. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
So if the theoretical min compression limit (log2(n)*(x)) has a 3% margin but your transform has a less than 3% inflate rate at most then there is room for the transform to compress below the theoretical min. With every transform the entropy changes, the potential for greater compression also changes, with each pass you can compress untill the entropy is so random it can no longer be comressed. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Whatever you do, you'll find that *on average* you will need *at least* 34 bits to be able to represent all possible 10-digit decimal numbers. Some might be shorter, but then others will be longer, and the average won't be less than 34. The theoretical limit for arbitrary numbers 0 - 9 must be viewed as an average not as an impossibility to, in some cases be able to compress to or under 34. This is obvious by the decimal to binary function. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 10/24/17, Richard Damon wrote: > My understanding of the 'Random Data Comprehensibility' challenge is > that is requires that the compression take ANY/ALL strings of up to N > bits, and generate an output stream no longer than the input stream, and > sometime less. That's incorrect, at least of the challenge being discussed. Here's the link: http://marknelson.us/2012/10/09/the-random-compression-challenge-turns-ten/ The challenge is just to take a single known file of a million random digits and make it smaller, including the size of the decompresser and without hiding data. So far in 15 years nobody has succeeded even at this, and nobody knows whether it's impossible. For instance it may be the case that the data in the file happens to be the nth prime, in which case it could simply be compressed to the number n with a decompresser that calculates process. > It admits that given no-uniformly distributed data, it is > possible to compress some patterns, the common ones, and expand others, > the uncommon ones, to lower the net average length. What it says can't > be done is to have a compression method that compresses EVERY input > pattern. That is where the 'Pigeon Hole' principle comes into play which > the people who claim to be able to compress random data like to ignore > or just attempt to say doesn't apply. There is a second challenge on that page that is as you state, but the page admits that this second challenge is a bit of a troll since this is provably impossible. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Lele Gaifax wrote: That's simple enough: of course one empty file would be "music.mp3.zip.zip.zip", while the other would be "movie.avi.zip.zip.zip.zip.zip"... some sort of https://en.wikipedia.org/wiki/Water_memory applied to file system entries :-) If you're allowed to alternate between two compression methods, then the way you decompress music.mp3.zip.zip.tgz.zip...tgz.zip.tgz is to output 0 each time zip was applied and 1 each time tar/gz was applied. You may be able to take some shortcuts in some cases, e.g. anything beginning with "movie.flv" almost certainly contains a cute kitten video. (Unless it's found inside an encrypted disk partition, in which case it contains porn.) -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Ben Bacarisse wrote: The trouble is a pedagogic one. Saying "you can't compress random data" inevitably leads (though, again, this is just my experience) to endless attempts to define random data. It's more about using terms without making sure everyone agrees on the definitions being used. In this context, "random data" really means "uniformly distributed data", i.e. any bit sequence is equally likely to be presented as input. *That's* what information theory says can't be compressed. I think "arbitrary data" (thereby including the results of compression by said algorithm) is the best way to make progress. I'm not sure that's much better, because it doesn't home in on the most important thing, which is the probability distribution. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano wrote: - Encrypted data looks very much like random noise. There's actually a practical use for that idea. If you can feed the output of an encryption algorithm through a compressor and make it smaller, it means there is a cryptographic weakness in the algorithm that could potentially be exploited. Good encryption algorithms produce output that looks almost completely random to anyone who doesn't know how to decrypt it. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 10/24/17 6:30 PM, Steve D'Aprano wrote: On Wed, 25 Oct 2017 07:09 am, Peter J. Holzer wrote: On 2017-10-23 04:21, Steve D'Aprano wrote: On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: If the probability of certain codes (either single codes, or sequences of codes) are non-equal, then you can take advantage of that by encoding the common cases into a short representation, and the uncommon and rare cases into a longer representation. As you say: Otherwise, if ( 0, 0 ) is much more frequent, we can encode ( 0, 0 ) by "0" and ( 0, 1 ) by "101", ( 1, 0 ) by "110", and ( 1, 1 ) by "111". And we could then use /less/ than two bits on the average. That's incorrect. On average you use 2.5 bits. (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.) I disagree. If the distribution is not equal, then the average needs to take the different probabilities into account. I think I would call that the *weighted* average rather than the average. Regardless of what we call it, of course both you and Stefan are right in how to calculate it, and such a variable-length scheme can be used to compress the data. E.g. given the sequence 0011 which would take 8 bits in the obvious encoding, we can encode it as "000111" which takes only 6 bits. But the cost of this encoding scheme is that *some* bit sequences expand, e.g. the 8 bit sequence 1100 is encoded as "10" which requires 10 bits. The end result is that averaged over all possible bit sequences (of a certain size), this encoding scheme requires MORE space than the obvious 0/1 bits. But in practice we don't care much, because the data sequences we care about are usually not "all possible bit sequences", but a heavily restricted subset where there are lots of 00 pairs and fewer 01, 10, and 11 pairs. My understanding of the 'Random Data Comprehensibility' challenge is that is requires that the compression take ANY/ALL strings of up to N bits, and generate an output stream no longer than the input stream, and sometime less. It admits that given no-uniformly distributed data, it is possible to compress some patterns, the common ones, and expand others, the uncommon ones, to lower the net average length. What it says can't be done is to have a compression method that compresses EVERY input pattern. That is where the 'Pigeon Hole' principle comes into play which the people who claim to be able to compress random data like to ignore or just attempt to say doesn't apply. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Wed, Oct 25, 2017 at 9:11 AM, Steve D'Aprano wrote: > On Wed, 25 Oct 2017 02:40 am, Lele Gaifax wrote: > >> Steve D'Aprano writes: >> >>> But given an empty file, how do you distinguish the empty file you get >>> from 'music.mp3' and the identical empty file you get from 'movie.avi'? >> >> That's simple enough: of course one empty file would be >> "music.mp3.zip.zip.zip", while the other would be >> "movie.avi.zip.zip.zip.zip.zip"... some sort of >> https://en.wikipedia.org/wiki/Water_memory applied to file system entries >> :-) > > > Does that mean if I name an empty file > > serenity2-by-joss-whedon.avi.zip.zip.zip.zip.zip > > Dancerswithnumbers' magic algorithm will recreate the movie from some > alternative universe where it actually exists? > > Awesome. Yes, but then you'd get dmca-takedown-request.pdf.zip.zip.zip.zip.zip.zip.zip which would also be empty. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Wed, 25 Oct 2017 07:09 am, Peter J. Holzer wrote: > On 2017-10-23 04:21, Steve D'Aprano wrote: >> On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: >>> >> If the probability of certain codes (either single codes, or sequences of >> codes) are non-equal, then you can take advantage of that by encoding the >> common cases into a short representation, and the uncommon and rare cases >> into a longer representation. As you say: >> >> >>> Otherwise, if ( 0, 0 ) is much more frequent, >>> we can encode ( 0, 0 ) by "0" and >>> >>> ( 0, 1 ) by "101", >>> ( 1, 0 ) by "110", and >>> ( 1, 1 ) by "111". >>> >>> And we could then use /less/ than two bits on the >>> average. >> >> That's incorrect. On average you use 2.5 bits. >> >> (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.) > > I disagree. If the distribution is not equal, then the average needs to > take the different probabilities into account. I think I would call that the *weighted* average rather than the average. Regardless of what we call it, of course both you and Stefan are right in how to calculate it, and such a variable-length scheme can be used to compress the data. E.g. given the sequence 0011 which would take 8 bits in the obvious encoding, we can encode it as "000111" which takes only 6 bits. But the cost of this encoding scheme is that *some* bit sequences expand, e.g. the 8 bit sequence 1100 is encoded as "10" which requires 10 bits. The end result is that averaged over all possible bit sequences (of a certain size), this encoding scheme requires MORE space than the obvious 0/1 bits. But in practice we don't care much, because the data sequences we care about are usually not "all possible bit sequences", but a heavily restricted subset where there are lots of 00 pairs and fewer 01, 10, and 11 pairs. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Wed, 25 Oct 2017 02:40 am, Lele Gaifax wrote: > Steve D'Aprano writes: > >> But given an empty file, how do you distinguish the empty file you get >> from 'music.mp3' and the identical empty file you get from 'movie.avi'? > > That's simple enough: of course one empty file would be > "music.mp3.zip.zip.zip", while the other would be > "movie.avi.zip.zip.zip.zip.zip"... some sort of > https://en.wikipedia.org/wiki/Water_memory applied to file system entries > :-) Does that mean if I name an empty file serenity2-by-joss-whedon.avi.zip.zip.zip.zip.zip Dancerswithnumbers' magic algorithm will recreate the movie from some alternative universe where it actually exists? Awesome. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, Oct 24, 2017 at 12:20 AM, Gregory Ewing wrote: > danceswithnumb...@gmail.com wrote: >> >> I did that quite a while ago. 352,954 kb. > > > Are you sure? Does that include the size of all the > code, lookup tables, etc. needed to decompress it? My bet is that danceswithnumbers does indeed have a file of that size which is in some way derived from the million random digits, but without any means of losslessly "decompressing" it (thus making it junk data). -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 14:51:37 +1100, Steve D'Aprano wrote: On Tue, 24 Oct 2017 01:27 pm, danceswithnumb...@gmail.com wrote: > Yes! Decode reverse is easy..sorry so excited i could shout. Then this should be easy for you: http://marknelson.us/2012/10/09/the-random-compression-challenge-turns-ten/ All you need to do is compress this file: http://marknelson.us/attachments/million-digit-challenge/AMillionRandomDigits.bin to less than 415241 bytes, and you can win $100. Then, on Mon, 23 Oct 2017 21:13:00 -0700 (PDT), danceswithnumbers wrote: > I did that quite a while ago. But 352,954 kb > 415241 bytes, by several orders of magnitude; so you didn't "do that". (Or are we using the European decimal point?) If you're claiming 352,954 *bytes*, not kb, I invite you to explain why you have not collected Mark Nelson's $100 prize, and untold fame and glory; failing which, your credibility will evaporate. -- To email me, substitute nowhere->runbox, invalid->com. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-23 04:21, Steve D'Aprano wrote: > On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: >> > If the probability of certain codes (either single codes, or sequences of > codes) are non-equal, then you can take advantage of that by encoding the > common cases into a short representation, and the uncommon and rare cases > into a longer representation. As you say: > > >> Otherwise, if ( 0, 0 ) is much more frequent, >> we can encode ( 0, 0 ) by "0" and >> >> ( 0, 1 ) by "101", >> ( 1, 0 ) by "110", and >> ( 1, 1 ) by "111". >> >> And we could then use /less/ than two bits on the >> average. > > That's incorrect. On average you use 2.5 bits. > > (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.) I disagree. If the distribution is not equal, then the average needs to take the different probabilities into account. Let's assume that (0, 0) has a probability of 90 %, (0, 1) a probability of 10 % and (1, 0) and (1, 1) a probability of 5 % each. Then the average length is 0.9 * 1 bit + 0.1 * 3 bits + 0.05 * 3 bits + 0.05 * 3 bits = 1.5 bits. hp -- _ | Peter J. Holzer| Fluch der elektronischen Textverarbeitung: |_|_) || Man feilt solange an seinen Text um, bis | | | h...@hjp.at | die Satzbestandteile des Satzes nicht mehr __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24/10/2017 16:40, Lele Gaifax wrote: Steve D'Aprano writes: But given an empty file, how do you distinguish the empty file you get from 'music.mp3' and the identical empty file you get from 'movie.avi'? That's simple enough: of course one empty file would be "music.mp3.zip.zip.zip", while the other would be I swear this looks like the lyrics of something or another... "Music MP3 - zip - zip - zip" TJG -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano writes: > But given an empty file, how do you distinguish the empty file you get > from 'music.mp3' and the identical empty file you get from 'movie.avi'? That's simple enough: of course one empty file would be "music.mp3.zip.zip.zip", while the other would be "movie.avi.zip.zip.zip.zip.zip"... some sort of https://en.wikipedia.org/wiki/Water_memory applied to file system entries :-) ciao, lele. -- nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia. l...@metapensiero.it | -- Fortunato Depero, 1929. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano writes: > On Tue, 24 Oct 2017 06:46 pm, danceswithnumb...@gmail.com wrote: > >> Greg, you're very smart, but you are missing a big key. I'm not padding, >> you are still thinking inside the box, and will never solve this by doing >> so. Yes! At least you see my accomplishment, this will compress any random >> file. > > Talk is cheap. But highly prized. Most Usenet cranks only want to be talked to (they see it as being taken seriously, no matter how rude the respondents are) so for the cost of something cheap (a little babbling) they get an endless stream of highly prized attention. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano writes: > On Tue, 24 Oct 2017 09:23 pm, Ben Bacarisse wrote: > >> Forget random data. For one thing it's hard to define, > > That bit is true. > >> but more importantly no one cares about it. > > But that's wrong. All generalisations are false. I was being hyperbolic. > For instance: > > - Encrypted data looks very much like random noise. With more and more data > traversing the internet in encrypted form, the ability to compress random > noise would be worth billions. > > - Compressed data looks somewhat like random noise (with a bit of structure). > The more it is compressed, the more random it looks. If you could compress > random noise, you could take already compressed data, and compress it again, > saving even more space. > > - Many multimedia formats (images, sound, video) are compressed using > dedicated encoders. The better the encoder, the more it compresses the data > (whether lossy or not) the harder it is to compress it further. If you could > compress random noise, you could compress JPGs, MP3s, h265-encoded MKVs, > etc, saving even more storage and transmission costs. But these are not random data. We care about these because they are are highly structured, non-random data. > And most importantly: > > - Random data is a superset of the arbitrary structured data you mention > below. If we could compress random data, then we could compress any data > at all, no matter how much or little structure it contained. Yes, that's part of my point. Arbitrary data includes random data but it avoids arguments about what random means. > This is why the ability to compress random data (if it were possible, which it > is not) is interesting. Its not because people want to be able to compress > last night's lottery numbers, or tables of random digits. The trouble is a pedagogic one. Saying "you can't compress random data" inevitably leads (though, again, this is just my experience) to endless attempts to define random data. My preferred way out of that is to talk about algorithmic complexity but for your average "I've got a perfect compression algorithm" poster, that is step too far. I think "arbitrary data" (thereby including the results of compression by said algorithm) is the best way to make progress. >> Then you publish in a major journal. Post the link to the journal >> article when you are done. > > These days there are plenty of predatory journals which will be happy to take > Dancerswithnumber's money in return for publishing it in a junk > journal. Sure, but you usually get a huge advantage -- a text to criticise. Your average Usenet crank will keep changing what they say to avoid being pinned down. Plus you get to note the fact that the journal is junk. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 06:46 pm, danceswithnumb...@gmail.com wrote: > Greg, you're very smart, but you are missing a big key. I'm not padding, > you are still thinking inside the box, and will never solve this by doing > so. Yes! At least you see my accomplishment, this will compress any random > file. Talk is cheap. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 09:23 pm, Ben Bacarisse wrote: > Forget random data. For one thing it's hard to define, That bit is true. > but more importantly no one cares about it. But that's wrong. For instance: - Encrypted data looks very much like random noise. With more and more data traversing the internet in encrypted form, the ability to compress random noise would be worth billions. - Compressed data looks somewhat like random noise (with a bit of structure). The more it is compressed, the more random it looks. If you could compress random noise, you could take already compressed data, and compress it again, saving even more space. - Many multimedia formats (images, sound, video) are compressed using dedicated encoders. The better the encoder, the more it compresses the data (whether lossy or not) the harder it is to compress it further. If you could compress random noise, you could compress JPGs, MP3s, h265-encoded MKVs, etc, saving even more storage and transmission costs. And most importantly: - Random data is a superset of the arbitrary structured data you mention below. If we could compress random data, then we could compress any data at all, no matter how much or little structure it contained. This is why the ability to compress random data (if it were possible, which it is not) is interesting. Its not because people want to be able to compress last night's lottery numbers, or tables of random digits. > By its very nature, random data is > not interesting. What people want is a reversible compression algorithm > that works on *arbitrary data* -- i.e. on *any* file at all, no matter > how structured and *non-random* it is. In a sense you are right. Compressing randomly generated data would be a parlour trick and not specifically very useful. But if you had such an algorithm, that would change the face of the world. It would be as revolutionary and paradigm breaking as a perpetual motion machine, or discovery of a new continent the size of China in the middle of the Atlantic, or that π actually does equal 22/7 exactly. And just as impossible. > For example, run the complete works of Shakespeare through your program. > The result is very much not random data, but that's the sort of data > people want to compress. If you can compress the output of your > compressor you have made a good start. Of course what you really want > to be able to do is to compress the output that results from compressing > your compressed out. And, of course, you should not stop there. Since > you can compress *any* data (not just the boring random stuff) you can > keep going -- compressing the compressed output again and again until > you end up with a zero-length file. Indeed. That proof by contradiction is yet another reason we know we can't compress random data -- that is to say, *arbitrary* data. If we had a compression program which could guarantee to ALWAYS shrink ANY file by at least one bit, then we could just apply it over and over again, shrinking the compressed file again and again, until we were left with a zero-byte file: original.dat = 110 MB original.dat.zip.zip.zip.zip.zip.zip.zip = 0 MB And then reverse the process, to turn an empty file back into the original. But given an empty file, how do you distinguish the empty file you get from 'music.mp3' and the identical empty file you get from 'movie.avi'? Obviously you cannot. So we know that the only way to *guarantee* to shrink every possible file is if the compression is lossy. > Then you publish in a major journal. Post the link to the journal > article when you are done. These days there are plenty of predatory journals which will be happy to take Dancerswithnumber's money in return for publishing it in a junk journal. https://en.wikipedia.org/wiki/Predatory_open_access_publishing -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24 October 2017 at 12:04, Ben Bacarisse wrote: > Paul Moore writes: > >> On 24 October 2017 at 11:23, Ben Bacarisse wrote: >>> For example, run the complete works of Shakespeare through your program. >>> The result is very much not random data, but that's the sort of data >>> people want to compress. If you can compress the output of your >>> compressor you have made a good start. Of course what you really want >>> to be able to do is to compress the output that results from compressing >>> your compressed out. And, of course, you should not stop there. Since >>> you can compress *any* data (not just the boring random stuff) you can >>> keep going -- compressing the compressed output again and again until >>> you end up with a zero-length file. >> >> Oh, and just for fun, if you are able to guarantee compressing >> arbitrary data, then > > It's a small point, but you are replying to a post of mine and saying > "you". That could make people think that /I/ am claiming to have a perfect > compression algorithm. Sorry. I intended the meaning "If one is able to..." but I was unclear. My bad. >> 1. Take a document you want to compress. >> 2. Compress it using your magic algorithm. The result is smaller. >> 3. Compress the compressed data. The result is still smaller. >> 4. Repeat until you hit 0 bytes. > > Isn't this just repeating what I said? I must has not written is > clearly enough. More accurately, I didn't read it carefully enough. Again sorry. However, I guess it serves as an example of a compression algorithm - we can trivially compress the content of our two posts into a single post with just as much information content, by deleting my post :-) Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Paul Moore writes: > On 24 October 2017 at 11:23, Ben Bacarisse wrote: >> For example, run the complete works of Shakespeare through your program. >> The result is very much not random data, but that's the sort of data >> people want to compress. If you can compress the output of your >> compressor you have made a good start. Of course what you really want >> to be able to do is to compress the output that results from compressing >> your compressed out. And, of course, you should not stop there. Since >> you can compress *any* data (not just the boring random stuff) you can >> keep going -- compressing the compressed output again and again until >> you end up with a zero-length file. > > Oh, and just for fun, if you are able to guarantee compressing > arbitrary data, then It's a small point, but you are replying to a post of mine and saying "you". That could make people think that /I/ am claiming to have a perfect compression algorithm. > 1. Take a document you want to compress. > 2. Compress it using your magic algorithm. The result is smaller. > 3. Compress the compressed data. The result is still smaller. > 4. Repeat until you hit 0 bytes. Isn't this just repeating what I said? I must has not written is clearly enough. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24 October 2017 at 11:23, Ben Bacarisse wrote: > For example, run the complete works of Shakespeare through your program. > The result is very much not random data, but that's the sort of data > people want to compress. If you can compress the output of your > compressor you have made a good start. Of course what you really want > to be able to do is to compress the output that results from compressing > your compressed out. And, of course, you should not stop there. Since > you can compress *any* data (not just the boring random stuff) you can > keep going -- compressing the compressed output again and again until > you end up with a zero-length file. Oh, and just for fun, if you are able to guarantee compressing arbitrary data, then 1. Take a document you want to compress. 2. Compress it using your magic algorithm. The result is smaller. 3. Compress the compressed data. The result is still smaller. 4. Repeat until you hit 0 bytes. Congratulations - apparently you have a reversible algorithm that compresses every data set to an empty file. (Caveat - there's actually "hidden data" here, as you need to know how many compressions it takes to hit 0 bytes. Because you decrease the size every time, though, that number must be no greater than the size of the original file). Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 05:20 pm, Gregory Ewing wrote: > danceswithnumb...@gmail.com wrote: >> I did that quite a while ago. 352,954 kb. > > Are you sure? Does that include the size of all the > code, lookup tables, etc. needed to decompress it? > > But even if you have, you haven't disproved the theorem about > compressing random data. All you have is a program that > compresses *that particular* sequence of a million digits. > > To disprove the theorem, you would need to exhibit an > algorithm that can compress *any* sequence of a million > digits to less than 415,241 bytes. Indeed -- but let's give Dancerswithnumbers his due. *IF* he is right (a very big "if" indeed) about being able to compress the Rand Corporation "Million Random Digits" in binary form, as given, that alone would be an impressive trick. Compressing the digits in text form is not impressive in the least. As Ben Bacarisse pointed out, most of us will probably already have half a dozen programs that do that. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
danceswithnumb...@gmail.com writes: > Finally figured out how to turn this into a random binary compression > program. Since my transform can compress more than dec to binary. Then > i took a random binary stream, Forget random data. For one thing it's hard to define, but more importantly no one cares about it. By its very nature, random data is not interesting. What people want is a reversible compression algorithm that works on *arbitrary data* -- i.e. on *any* file at all, no matter how structured and *non-random* it is. For example, run the complete works of Shakespeare through your program. The result is very much not random data, but that's the sort of data people want to compress. If you can compress the output of your compressor you have made a good start. Of course what you really want to be able to do is to compress the output that results from compressing your compressed out. And, of course, you should not stop there. Since you can compress *any* data (not just the boring random stuff) you can keep going -- compressing the compressed output again and again until you end up with a zero-length file. Then you publish in a major journal. Post the link to the journal article when you are done. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24 October 2017 at 09:43, Gregory Ewing wrote: > Paul Moore wrote: >> >> But that's not "compression", that's simply using a better encoding. >> In the technical sense, "compression" is about looking at redundancies >> that go beyond the case of how effectively you pack data into the >> bytes available. > > > There may be a difference in the way the terms are used, but > I don't think there's any fundamental difference. Compression > is about finding clever ways to make the encoding better. Agreed - I was trying (probably futilely, given the way this thread has gone...) to make a distinction between purely local properties that are typically considered in "how you encode the data" and the detection of more global patterns, which is where what are typically referred to as "compression" algorithms get their power. But sadly, I don't think the OP is actually interested in understanding the background, so the distinction wasn't really worth making :-( > Either way, the information-theoretic limits on the number > of bits needed are the same. Precisely. Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Paul Moore wrote: But that's not "compression", that's simply using a better encoding. In the technical sense, "compression" is about looking at redundancies that go beyond the case of how effectively you pack data into the bytes available. There may be a difference in the way the terms are used, but I don't think there's any fundamental difference. Compression is about finding clever ways to make the encoding better. Either way, the information-theoretic limits on the number of bits needed are the same. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
danceswithnumb...@gmail.com wrote: My 8 year old can decode this back into base 10, Keep in mind that your 8 year old has more information than just the 32 bits you wrote down -- he can also see that there *are* 32 bits and no more. That's hidden information that you're not counting. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
No leading zeroes are being dropped offwish this board has an edit button. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Am 23.10.17 um 12:13 schrieb Marko Rauhamaa: Thomas Jollans : On 2017-10-23 11:32, danceswithnumb...@gmail.com wrote: According to this website. This is an uncompressable stream. https://en.m.wikipedia.org/wiki/Incompressible_string 12344321 No, it's not. According to that article, that string is incompressible by a particular algorithm. I can see no more general claims. Here's a compression algorithm that manages to compress that string into a 0-bit string: * If the original string is 12344321 (whatever that means), return the empty bit string. * Otherwise, prepend a don't-care bit to the original string and return the result of the concatenation. ...and that's why there is the "Kolmogorov complexity". You need to append the decompression program to the data to show how much you really saved, which will turn out to be nothing compared to the "trivial decompressor" print "12344321" Christian -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Greg, you're very smart, but you are missing a big key. I'm not padding, you are still thinking inside the box, and will never solve this by doing so. Yes! At least you see my accomplishment, this will compress any random file. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
danceswithnumb...@gmail.com wrote: Compress this: 4135124325 Bin to dec...still very large 0110 0000 1101 01100101 Wait right there! You're cheating by dropping off leading 0 bits. The maximum value of a 10 digit decimal number is 99, which in hex is 2540be3ff. That's 34 bits. That's in line with the theoretical number of bits needed: log2(10) * 10 = 33.219 So the binary version of your number above is really 00 0110 0000 1101 01100101 You may think you can get away without storing or transmitting those leading 0 bits, because the decoder can always pad out the data as needed. But to do that, the decoder needs to know *how many* bits to pad out to. That information somehow needs to be part of the encoded data. You need to imagine you're sending the data to someone over a wire. The only thing you can send along the wire are ones and zeroes. You can't convey any information by timing, or turning off the power, or anything like that. How is the receiver going to know when he's got the whole message? There are only two possibilities. Either you decide in advance that all messages will be exactly the same length, which in this case means always sending exactly 34 bits. Or you add some extra bits to the message -- prepend the length in binary, or add an end-of-message code at the end, or something like that. Whatever you do, you'll find that *on average* you will need *at least* 34 bits to be able to represent all possible 10-digit decimal numbers. Some might be shorter, but then others will be longer, and the average won't be less than 34. New compression method: 11000101 11000111 0100 A full byte less than bin. You need to be *very* careful about what you're claiming here. Are you saying that your algorithm compresses *all possible* sequences of 10 decimal digits to 3 bytes or less? Or can some of them come out longer? -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Gregory Ewing : > What you *can't* do is compress 16 random decimal digits to less than > 6.64 bytes. More precisely: Regardless of the compression scheme, the probability of shortening the next bit sequence is less than 0.5 if the bits are distributed evenly, randomly and independently. Often the distribution is not even, or there is interdependence between the bits. Then, a compression scheme can do miracles. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
danceswithnumb...@gmail.com wrote: I did that quite a while ago. 352,954 kb. Are you sure? Does that include the size of all the code, lookup tables, etc. needed to decompress it? But even if you have, you haven't disproved the theorem about compressing random data. All you have is a program that compresses *that particular* sequence of a million digits. To disprove the theorem, you would need to exhibit an algorithm that can compress *any* sequence of a million digits to less than 415,241 bytes. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
danceswithnumb...@gmail.com wrote: 12344321 It only takes seven 8 bit bytes to represent this This is not surprising. The theoretical minimum size for 16 arbitrary decimal digits is: log2(10) * 16 = 53.15 bits = 6.64 bytes I think you misunderstand what is meant by the phrase "random data cannot be compressed". A string of decimal digits represented in ASCII is *not* completely random. There are 256 possible values for each byte, and you're only using 10 of them. What you *can't* do is compress 16 random decimal digits to less than 6.64 bytes. If you have something that you think can do better than that, we would like to see it. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, Oct 24, 2017 at 2:28 AM, Paul Moore wrote: > Hope this helps put the subject into context. Compression is a very > technical subject, to "do it right". Special cases can be worked out, > sure, but the "hidden assumptions" in a method are what make the > difference between a "compression algorithm" and a "way of storing my > particular data more efficiently". And to put *this* into context and perspective, both of the above are extremely useful tools/algorithms. Generalized compression algos are used all the time, special-purpose lossy compression algos too, and "way[s] of storing my particular data more efficiently" are utterly crucial to many applications. But they're not universal compression schemes. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 03:13 pm, danceswithnumb...@gmail.com wrote: > I did that quite a while ago. 352,954 kb. Sure you did. Let's see the code you used. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
I did that quite a while ago. 352,954 kb. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 01:27 pm, danceswithnumb...@gmail.com wrote: > Finally figured out how to turn this into a random binary compression > program. Since my transform can compress more than dec to binary. Then i > took a random binary stream, changed it to a decimal stream 0-9 tranformed > it into a compressed/encrypted binary stream 23.7% smaller. Smaller than the original binary stream? I don't think so. There's nothing clever about "compressing" a random binary stream if first you expand it, then remove the extra space you added and claim victory because the result is smaller than the expanded version. > Yes! Decode reverse is easy..sorry so excited i could shout. Then this should be easy for you: http://marknelson.us/2012/10/09/the-random-compression-challenge-turns-ten/ All you need to do is compress this file: http://marknelson.us/attachments/million-digit-challenge/AMillionRandomDigits.bin to less than 415241 bytes, and you can win $100. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Finally figured out how to turn this into a random binary compression program. Since my transform can compress more than dec to binary. Then i took a random binary stream, changed it to a decimal stream 0-9 tranformed it into a compressed/encrypted binary stream 23.7% smaller. Yes! Decode reverse is easy..sorry so excited i could shout. Jon Hutton -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Mon, Oct 23, 2017 at 1:42 PM, wrote: > Wow, do programmers actually use zscii. That is huge. So much wated space. Not really. ZSCII is only relevant if you're writing Z-code or a Z-code interpreter. Those in turn are only relevant if you're writing Infocom games. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Wow, do programmers actually use zscii. That is huge. So much wated space. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-23, alister via Python-list wrote: >> 12344321 >> >> It only takes seven 8 bit bytes to represent this > > Would you care to provide the seven 8-bit bytes you propose to use? > Paul I would suspect he is using BCD & storing 2 values in reach byte that is not what is meant by you cant compress random data. his compression is simply removing redundant space from an inefficient coding >>> >>> I suspect he is using ASCII and storing one value in each byte. >> >> There's also ZSCII, which stores roughly 3 characters every 2 >> bytes. Since all the digits are in A2, this sequence would >> take up 7 bytes in ZSCII as well. >> >> http://inform-fiction.org/zmachine/standards/z1point0/sect03.html > > not sure how 16 characters can be represented by either ascii > or zscii in only 8 bytes Oops! I hastily counted completely wrong. It's 10 bytes in ZSCII version 2, using a shift-lock. * 16 bits* 12 -> 0 00101 01001 01010 349 -> 0 01011 01100 10001 999 -> 0 10001 10001 10001 888 -> 0 1 1 1 843 -> 0 1 01100 01011 21 -> 1 01010 01001 00101 -- Neil Cerutti -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Good point I hope it has a use, other than a cute toyi don't see it yet. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-23 17:39, danceswithnumb...@gmail.com wrote: > Thanks Paul...blunt to the point. > > My 8 year old can decode this back into base 10, i still have to help him a > bit going from base 10 to 8 bit bytesit's incredibly simple to decode. No > dictionary, can easily be done with pencil and paper, does not rely on > redundancies. It is, is it? Well, you know the rules: "Code or it didn't happen." -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-23 07:39, Steve D'Aprano wrote: > By the way: here is a very clever trick for hiding information in the file > system: > > http://www.patrickcraig.co.uk/other/compression.php > > > but as people point out, the information in the file, plus the information in > the file system, ends up being *larger* than the original. Well done to > Patrick Craig for finding a clever loophole to "win" the challenge, but he > did so without actually compressing the original data. Thanks Steve that was very entertaining. -- Thomas Jollans -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Thanks Paul...blunt to the point. My 8 year old can decode this back into base 10, i still have to help him a bit going from base 10 to 8 bit bytesit's incredibly simple to decode. No dictionary, can easily be done with pencil and paper, does not rely on redundancies. Jon Hutton -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Just trying to find a practical application for this alg. Not real useful as it stands now. Jon Hutton -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 23 October 2017 at 15:29, wrote: > I'm really not trolling, and even though some are sarcastic i sm learning > from your comments. I'm willing to believe that, but if you're trying to claim you have "compressed" data (in a way that satisfies the technical, information-theoretic meaning of the word), you need to be *really* careful not to include hidden information in your assumptions. For example, if I have a string made up of only the numbers 0-7, then I can trivially (octal) store that data in 3 bits per digit. But that's not compression, as there's "hidden information" in the knowledge that you're using a restricted character set. Factor in that information and your string only contains 3 bits of information per digit. Using bytes (characters, if you assume a 1-byte encoding) to hold just the digits 0-9 is inefficient (there's 256 bytes and you're only using 10 of them), and "of course" you can hold that data more efficiently. But that's not "compression", that's simply using a better encoding. In the technical sense, "compression" is about looking at redundancies that go beyond the case of how effectively you pack data into the bytes available. > Dec to bin is not bad at removing wasted space Yep, you're not talking about what people usually refer to as compression, but rather about optimising an encoding. >, but there is a better way. Here is an example. How would you compress these >numbers. 10 digits = log2(10) bits of information. So getting down to 4 bits is about encoding. You can go further by using a variable length encoding and "extra knowledge" about which digits come up most commonly to give the common digits shorter representation. That's called Gray coding. You can use the fact that repeated copies of the same digit come up together a lot to replace them by digit + count. That's run-length encoding. There are other more complex approaches. But what *all* of these have in common is that if you provide random input (within the limits of what you support - digit strings here) then you'll *always* get at least one input that encodes to a longer output than your input. > Compress this: > > 4135124325 10 x 10 digits = 10 x log2(10) bits of information = a bit under 34 bits of information > > Bin to dec...still very large > 0110 > 0000 > 1101 > 01100101 4x8 = 32 bits, but there's probably a couple of leading zeros needed if you want to encode all 10-digit numbers. > New compression method: > > 11000101 > 11000111 > 0100 > > A full byte less than bin. You'll need to explain how to decode this, in a way that can be used to decode the encodings of arbitrary 10-digit numbers, and with any leading zeroes that are needed to cover the whole space of 10-digit numbers, before you can claim you've compressed 10-digit numbers using only 24 bits. And if you do that, you've basically overturned most of information theory, so I'd start by assuming there's a flaw in your argument - sorry about that... ;-) Hope this helps put the subject into context. Compression is a very technical subject, to "do it right". Special cases can be worked out, sure, but the "hidden assumptions" in a method are what make the difference between a "compression algorithm" and a "way of storing my particular data more efficiently". > I know many are skepticalthats okay.this has taken 6 years, im not going > to insult your intelligence with something so juvenile as dec to bin. I'm > really trying to find an application for this since it only deals with digits > 0-9 or 0-20 or other strange combinations. Wait did you just give it away in > that small exampleno, because it also encrypts in a way that has never > been done. The compression is better than log(256)÷log (10)wait isn't > that impossible, binary is minimalistic. I agree that binary is minimalistic, > but the architecture is not, making all arguments conjecture...not laws. No-one is going to accept a claim that an algorithm you're not willing to publish is valid. This is about maths/science, not "proprietary algorithms" or anything like that. If you don't publish your methods, people will simply point at information theoretic proofs and say "either you're missing something, or your approach doesn't work in cases that I care about, so thanks but no thanks". Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
I'm really not trolling, and even though some are sarcastic i sm learning from your comments. Dec to bin is not bad at removing wasted space, but there is a better way. Here is an example. How would you compress these numbers. If you look for redundancy and then code to a bulky dictionary or change it to binary, one would unflate the other would only get it down to Compress this: 4135124325 Bin to dec...still very large 0110 0000 1101 01100101 New compression method: 11000101 11000111 0100 A full byte less than bin. I know many are skepticalthats okay.this has taken 6 years, im not going to insult your intelligence with something so juvenile as dec to bin. I'm really trying to find an application for this since it only deals with digits 0-9 or 0-20 or other strange combinations. Wait did you just give it away in that small exampleno, because it also encrypts in a way that has never been done. The compression is better than log(256)÷log (10)wait isn't that impossible, binary is minimalistic. I agree that binary is minimalistic, but the architecture is not, making all arguments conjecture...not laws. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Mon, 23 Oct 2017 13:40:59 +, Neil Cerutti wrote: > On 2017-10-23, Chris Angelico wrote: >> On Mon, Oct 23, 2017 at 11:18 PM, alister via Python-list >> wrote: >>> On Mon, 23 Oct 2017 10:41:55 +0100, Paul Moore wrote: >>> On 23 October 2017 at 10:32, wrote: > According to this website. This is an uncompressable stream. > > https://en.m.wikipedia.org/wiki/Incompressible_string > > 12344321 > > It only takes seven 8 bit bytes to represent this Would you care to provide the seven 8-bit bytes you propose to use? Paul >>> >>> I would suspect he is using BCD & storing 2 values in reach byte that >>> is not what is meant by you cant compress random data. his compression >>> is simply removing redundant space from an inefficient coding >> >> I suspect he is using ASCII and storing one value in each byte. > > There's also ZSCII, which stores roughly 3 characters every 2 bytes. > Since all the digits are in A2, this sequence would take up 7 bytes in > ZSCII as well. > > http://inform-fiction.org/zmachine/standards/z1point0/sect03.html not sure how 16 characters can be represented by either ascii or zscii in only 8 bytes -- I fear explanations explanatory of things explained. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-23, Chris Angelico wrote: > On Mon, Oct 23, 2017 at 11:18 PM, alister via Python-list > wrote: >> On Mon, 23 Oct 2017 10:41:55 +0100, Paul Moore wrote: >> >>> On 23 October 2017 at 10:32, >>> wrote: According to this website. This is an uncompressable stream. https://en.m.wikipedia.org/wiki/Incompressible_string 12344321 It only takes seven 8 bit bytes to represent this >>> >>> Would you care to provide the seven 8-bit bytes you propose >>> to use? >>> Paul >> >> I would suspect he is using BCD & storing 2 values in reach >> byte that is not what is meant by you cant compress random >> data. his compression is simply removing redundant space from >> an inefficient coding > > I suspect he is using ASCII and storing one value in each byte. There's also ZSCII, which stores roughly 3 characters every 2 bytes. Since all the digits are in A2, this sequence would take up 7 bytes in ZSCII as well. http://inform-fiction.org/zmachine/standards/z1point0/sect03.html -- Neil Cerutti -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Mon, Oct 23, 2017 at 11:18 PM, alister via Python-list wrote: > On Mon, 23 Oct 2017 10:41:55 +0100, Paul Moore wrote: > >> On 23 October 2017 at 10:32, wrote: >>> According to this website. This is an uncompressable stream. >>> >>> https://en.m.wikipedia.org/wiki/Incompressible_string >>> >>> 12344321 >>> >>> It only takes seven 8 bit bytes to represent this >> >> Would you care to provide the seven 8-bit bytes you propose to use? >> Paul > > I would suspect he is using BCD & storing 2 values in reach byte > that is not what is meant by you cant compress random data. > his compression is simply removing redundant space from an inefficient > coding I suspect he is using ASCII and storing one value in each byte. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Mon, 23 Oct 2017 10:41:55 +0100, Paul Moore wrote: > On 23 October 2017 at 10:32, wrote: >> According to this website. This is an uncompressable stream. >> >> https://en.m.wikipedia.org/wiki/Incompressible_string >> >> 12344321 >> >> It only takes seven 8 bit bytes to represent this > > Would you care to provide the seven 8-bit bytes you propose to use? > Paul I would suspect he is using BCD & storing 2 values in reach byte that is not what is meant by you cant compress random data. his compression is simply removing redundant space from an inefficient coding Can you compress that sequence on paper when you only have the values 0-9 to work with (forget computer representation & removing un-used bits) -- Old age and treachery will overcome youth and skill. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
danceswithnumb...@gmail.com writes: > ... First let me clarify before you lump this in with > perpetual motion, or cold fusion. It is a mapping solution to compress > ANY i repeat ANY random file with numbers of only 0 - 9 such as are in > the million rand numbers page. Entirely possible. Of course it is. I have half a dozen programs on this machine that do it. I don't need yours. It's possible you want to claim something beyond what you stated, but no one is going to get excited until you actually claim such a thing. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Thomas Jollans : > On 2017-10-23 11:32, danceswithnumb...@gmail.com wrote: >> According to this website. This is an uncompressable stream. >> >> https://en.m.wikipedia.org/wiki/Incompressible_string >> >> 12344321 > > No, it's not. According to that article, that string is incompressible > by a particular algorithm. I can see no more general claims. Here's a compression algorithm that manages to compress that string into a 0-bit string: * If the original string is 12344321 (whatever that means), return the empty bit string. * Otherwise, prepend a don't-care bit to the original string and return the result of the concatenation. >> It only takes seven 8 bit bytes to represent this > > You amaze me. > bin(12344321) > '0b1000110001100111001110100100010100100110111' len(bin(12344321)[2:]) > 51 7 * 8 > 56 So danceswithnumbers made a correct statement! Anyway, a fun troll. Marko -- https://mail.python.org/mailman/listinfo/python-list