Re: [music-dsp] about entropy encoding
Peter S, your combative attitude is unwelcome. It seems that you are less interested in grasping these topics than you are in hectoring myself and other list members. Given that and the dubious topicality of this thread, this will be my last response to you. I hope that you find a healthy way to address the source of your hostility, and also that you gain more insight into Information Theory. My apologies to the list for encouraging this unfortunate tangent. E On Thu, Jul 16, 2015 at 8:38 PM, Peter S peter.schoffhau...@gmail.com wrote: On 17/07/2015, Ethan Duni ethan.d...@gmail.com wrote: What are these better estimators? It seems that you have several estimators in mind but I can't keep track of what they all are, I urge you to slow down, collect your thoughts, and spend a bit more time editing your posts for clarity (and length). I urge you to pay more attention and read more carefully. I do not want to repeat myself several times. (Others will think it's repetitive and boring.) [And fuck the spend more time part, I already spent 30+ hours editing.] And what is entropy per bit? Entropy is measured in bits, in the first place. Did you mean entropy per symbol or something? Are you implying that a bit is not a symbol? A bit *is* a symbol. So of course, I meant that. Entropy is measured in bits, in the first place. According to IEC 8-13, entropy is measured in shannons: https://en.wikipedia.org/wiki/Shannon_%28unit%29 For historical reasons, bits is often used synonymously with shannons. Maybe you could try using this brain to interact in a good-faith way. Faith belongs to church. The entropy of a signal - as opposed to entropy rate - is not a well-defined quantity, generally speaking. It's exact value is not well-defined, yet it is *certain* to be non-zero. (Unless you only have only 1 particular signal with 100% probability.) The standard quantity of interest in the signal context is entropy rate Another standard quality of interest in the signal context is entropy. https://en.wikipedia.org/wiki/Entropy_%28information_theory%29 Quote: Entropy is a measure of unpredictability of information content. If you want to talk about signal entropy, distinct from the entropy rate, then you need to do some additional work to specify what you mean by that. Let me give you an example. You think that a constant signal has no randomness, thus no entropy (zero bits). Let's do a little thought experiement: I have a constant signal, that I want to transmit to you over some noiseless discrete channel. Since you think a constant signal has zero entropy, I send you _nothing_ (precisely zero bits). Now try to reconstruct my constant signal from the nothing that you received from me! Can you? . . . There's very high chance you can't. Let me give you a hint. My constant signal is 16 bit signed PCM, and first sample of it is sampled from uniform distribution noise. What is the 'entropy' of my constant signal? Answer: since the first sample is sampled from uniform distribution noise, the probability of you successfully guessing my constant signal is 1/65536. Hence, it has an entropy of log2(65536) = 16 bits. In other words, I need to send you all the 16 bits of the first sample for you to be able to reconstruct my constant signal with 100% certainity. Without receiving those 16 bits, you cannot reconstruct my constant signal with 100% certainity. That's the measure of its uncertainity or unpredictability. So you (falsely) thought a constant signal has zero randomness and thus zero entropy, yet it turns out that when I sampled that constant signal from the output of 16-bit uniform distribution white noise, then my constant signal will have 16 bits of entropy. And if I want to transmit it to you, then I need to send you a minimum of 16 bits for you to be able to reconstruct, despite that it's a constant signal. It may have an asymptotic 'entropy rate' of zero, yet that doesn't mean that the total entropy is zero. So the 'entropy rate' doesn't tell you the entropy of the signal. The total entropy (uncertainity, unpredictability, randomness) in this particular constant signal is 16 bits. Hence, its entropy is nonzero, and in this case, 16 bits. Hence, if I want to send it to you in a message, I need to send a minimum of 16 bits to send this constant signal to you. The 'entropy rate' doesn't tell you the full picture. Switching back and forth between talking about narrow parametric classes of signals (like rectangular waves) and general signals is going to lead to confusion, and it is on you to keep track of that and write clearly. When I say square wave, then I mean square wave. When I say arbitrary signal, then I mean arbitrary signal. What's confusing about that? What makes you unable to follow? Moreover, the (marginal) entropy of (classes of) signals is generally not an interesting quantity in the
Re: [music-dsp] about entropy encoding
Dear Ethan, You suggested me to be short and concise. My kind recommendation to you: 1) Read A Mathematical Theory of Communication. 2) Try to understand Theorem 2. 3) Try to see, when p_i != 1, then H != 0. I hope this excercise will help you grasp this topic. Best regards, Peter On 17/07/2015, Ethan Duni ethan.d...@gmail.com wrote: Peter S, your combative attitude is unwelcome. It seems that you are less interested in grasping these topics than you are in hectoring myself and other list members. Given that and the dubious topicality of this thread, this will be my last response to you. I hope that you find a healthy way to address the source of your hostility, and also that you gain more insight into Information Theory. My apologies to the list for encouraging this unfortunate tangent. E -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
On 17/07/2015, robert bristow-johnson r...@audioimagination.com wrote: On 7/17/15 1:26 AM, Peter S wrote: On 17/07/2015, robert bristow-johnsonr...@audioimagination.com wrote: in your model, is one sample (from the DSP semantic) the same as a message (from the Information Theory semantic)? A message can be anything - it can be a sample, a bit, a combination of samples or bits, a set of parameters representing a square wave, whatever. doesn't answer my question. It does, it just depends on the model. In the 1-bit square wave duty cycle estimator, a message is a bit. In the English text compression experiment, a message is a character. In the white noise entropy estimation experiment, a message is a byte. In the binary waveform entropy experiment, a message is a string of bits. In the bitflip counter, a message is an event that two consecutive bits differ In the parametric squarewave thought experiment, message is a set of parameteres describing a square wave. Whatever arbitrarily chosen thing I send to you over a channel, becomes a message. There's no universal definition of what a message is, depending on the particular model, it can be literally anything. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
On 7/17/15 1:26 AM, Peter S wrote: On 17/07/2015, robert bristow-johnsonr...@audioimagination.com wrote: in your model, is one sample (from the DSP semantic) the same as a message (from the Information Theory semantic)? A message can be anything - it can be a sample, a bit, a combination of samples or bits, a set of parameters representing a square wave, whatever. doesn't answer my question. is your measure the same as SUM{ prob(message) * -log2(prob(message)) } all messages ? Yes. In this particular example, a message means a set of parameters that describe a parametric square wave. To distinguish between two or more messages, at least 1 bit is needed. and this question doesn't get answered until the previous one does. -- r b-j r...@audioimagination.com Imagination is more important than knowledge. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
A linear predictor[1] tries to predict the next sample as the linear combination of previous samples as x'[n] = SUM [i=1..k] a_i * x[n-i] where x'[n] is the predicted sample, and a_1, a_2 ... a_k are the prediction coefficients (weights). This is often called linear predictive coding (LPC), and one typical application is to encode the formants in telephone telecommunications using source-filter model of speech[2]. Another typical application is waveform encoding in lossless wave compression codecs[3] like Shorten, FLAC, WavPack, MPEG-4 ALS, etc. The simplest possible form of a linear predictor is when a_1 = 1, and all other coefficients a_2, a_3, ... a_n = 0. So it simply predicts the next sample as x'[n] = x[n-1] In other words, it predicts the next sample to be the same as the previous one. This special case of linear prediction is called delta encoding[4], where the next sample is predicted to be the same, and only the difference between two consecutive samples are encoded. One compressed audio file format that uses this type of encoding is the Amiga IFF 8SVX file format[5]. If we simplify this delta encoding further and apply it to the smallest finite field GF(2), then we get: b'[n] = b[n-1] where b[n] is the Nth bit, and b'[n] is the prediction for the Nth bit. In this model, if we try to estimate the prediction error, then - since GF(2) only has two elements: 0 and 1 - the difference (delta) between two consecutive bits can be either 0 or 1. Since the addition operation in GF(2) is XOR, the difference (delta) between two consecutive bits equals A XOR B, that gives 1 for 0-1 or 1-0 transition, and gives zero for two identical bits. To calculate the total prediction error, we simply need to sum the total number of binary transitions in the binary string: E = SUM [i=2..N] ( b[i] XOR b[i-1] ) where N is the total number of bits, and E is the total error between the prediction and the actual samples. So an algorithm that counts bitflips in a binary string, can be thought of as the sum of the errors of the simplest possible linear predictor over GF(2) with a_1 = 1 and a_2, a_3, ... a_n = 0, that simply predicts that the next bit will be the same as the previous bit. By counting the binary transitions, we get the sum of 'deltas', or the sum of 'prediction error' of binary delta coding. If we think of entropy as 'unpredictability' and we expect it to (possibly nonlinearly) correlate with the sum of errors between a prediction and actual samples, then this simplistic linear predictor can be thought of as a very simplistic approximation to entropy. Of course, this is an extremely simplistic and dumb predictor - it is the simplest possible linear predictor that can possibly exist, as GF(2) is the smallest finite field, and it only has a single coefficient a_1 = 1. It cannot account for periodicity, previous bits, statistical probabilities, or other correlations. It can be thought of as an excercise in trying to create the simplest possible predictor. Despite this, and however simple this predictor may seem, by extending this idea further I could successfully create a lossless PCM wave compression algorithm with compression ratios comparable to that of codecs using more complex linear predictors, like FLAC or Shorten. (More on this later.) Best regards, Peter References: [1] https://en.wikipedia.org/wiki/Linear_prediction [2] https://en.wikipedia.org/wiki/Source%E2%80%93filter_model_of_speech_production [3] https://en.wikipedia.org/wiki/Linear_predictive_coding#Applications [4] https://en.wikipedia.org/wiki/Delta_encoding [5] https://en.wikipedia.org/wiki/8SVX [6] https://en.wikipedia.org/wiki/GF%282%29 -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
On 7/17/15 2:28 AM, Peter S wrote: Dear Ethan, You suggested me to be short and concise. My kind recommendation to you: 1) Read A Mathematical Theory of Communication. 2) Try to understand Theorem 2. 3) Try to see, when p_i != 1, then H != 0. I hope this excercise will help you grasp this topic. listen, one thing i have to remind myself here (because if i don't, i'm gonna get embarrassed) is to not underestimate either the level of scholarship nor the level of practical experience doing things related to music (or at least audio) and DSP. just a friendly caution to consider the same. On 7/17/15 3:27 PM, Peter S wrote: A linear predictor[1] tries to predict the next sample as the linear combination of previous samples as x'[n] = SUM [i=1..k] a_i * x[n-i] where x'[n] is the predicted sample, and a_1, a_2 ... a_k are the prediction coefficients (weights). This is often called linear predictive coding (LPC), listen, i understand LPC. i also understand (i remember having multiple email exchanges with Claude Cellier about the topic in the '90s) how to use LPC to reduce the word size (if the LPC is effective, the LPC guess for the following sample is, statistically-speaking, usually close to the actual sample, so the word size is smaller and the probability on each of those reduced-width words is higher) followed by Huffman coding on those often used delta values. if you have a non-flat spectrum, LPC will make it flat, and if you have a non-uniform p.d.f., Huffman coding will similarly take advantage of that. you can combine the two and tweek it to the point that the losslessly compressed file can't get much more compressed without adding more to the model (like periodicity, so that the LPC doesn't just use the most current N samples to predict the future sample, but also uses a bunch of samples that are P samples prior, where P is the best-fit period for a quasi-periodic input). i know how to do that (and done MATLAB implementations, but never anything that saw the commercial light-of-day). i also know how i might do it for music that is the combination of a few sources, if i get good pitch-detection data from a source separation alg, which is another big, difficult problem. i know how to do this stuff. i know how to teach this stuff and in a previous lifetime have done it. (i like to use a recurring character of George Carlin: Al Sleet, the Hippy-Dippy Weatherman, to illustrate Shannon's use of message probability, or frequency-of-occurance as a natural measure of the true information content a message has.) so be careful, lest you lose people's interest, to restrain any patronizing tone. we're not as dumb as you think. On 7/17/15 1:38 PM, Peter S wrote: On 17/07/2015, robert bristow-johnsonr...@audioimagination.com wrote: On 7/17/15 1:26 AM, Peter S wrote: On 17/07/2015, robert bristow-johnsonr...@audioimagination.com wrote: in your model, is one sample (from the DSP semantic) the same as a message (from the Information Theory semantic)? A message can be anything - it can be a sample, a bit, a combination of samples or bits, a set of parameters representing a square wave, whatever. doesn't answer my question. It does, it just depends on the model. In the 1-bit square wave duty cycle estimator, a message is a bit. In the English text compression experiment, a message is a character. In the white noise entropy estimation experiment, a message is a byte. In the binary waveform entropy experiment, a message is a string of bits. In the bitflip counter, a message is an event that two consecutive bits differ In the parametric squarewave thought experiment, message is a set of parameteres describing a square wave. Whatever arbitrarily chosen thing I send to you over a channel, becomes a message. There's no universal definition of what a message is, depending on the particular model, it can be literally anything. so here's my question(s) ... please read carefully because i'm the guy who gets to decide if my question really got answered, and i am still pretty much clueless about it: what is the atom of information that you are counting to create a histogram? maybe it's sophisticated, first you count every 1-bit message and create a two-column histogram. then rewind the soundfile and count every 2-bit message, grouped adjacently and add that to the histogram. then rewind the soundfile again and count each 3-bit message, and add that to the histogram. continue to some defined word-width limit. i can sorta see a generic measure of entropy doing that, but it assumes messages that are adjacent bits. you can add to that messages that are grouped together in any imagined fashion, not just adjacent bits. there are virtually uncountable ways of grouping together the bits of a binary file to measure entropy. i simply cannot believe you did it every-which-way. so, because i am too lazy to look around at previous posts,
Re: [music-dsp] about entropy encoding
I tested a simple, first-order histogram-based entropy estimate idea on various 8-bit signed waveforms (message=sample, no correlations analyzed). Only trivial (non-bandlimited) waveforms were analyzed. Method: 1) Signal is trivially turned into a histogram. 2) Probabilities assumed based on histogram frequencies. 3) Entropy rate of the probability distribution is calculated. Detailed results: - http://morpheus.spectralhead.com/entropy/histogram/ Observations: - White noise: - as window length increases, estimate converges to 1. - as amplitude decreases, estimate converges to ~0.2. Square wave: - estimate is about 0.125, irregardless of frequency or amplitude. Saw wave: - as frequency increases, converges to ~0.2. - as frequency decreases, converges to 1. Sine wave: - as frequency increases, converges to ~0.17. - as frequency decreases, converges to ~0.95. - as amplitude decreases, converges to ~0.2 Constant signal: - estimate is always zero. Conclusions: As expected, being first-order, overall a poor estimator. For some corner cases (white noise, constant signal), gives the correct result. For other, periodic signals, gives an estimate with varying amounts of error. Since it doesn't analyze correlations, it cannot distinguish low frequency saw wave from white noise. Gives a rather poor estimate for most low frequency periodic waveforms, does somewhat better for high frequencies. -P -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp