Re: [music-dsp] about entropy encoding
On 18/07/2015, Peter S wrote: > In the tests that gave good results for *some* test material, I > simply grouped adjacent bits. (*) (*) ...and of course, building histograms from characters or samples also means 'grouping adjacent bits', since a character means "8 adjacent bits", a sample means "N adjancent bits" etc. When the source material is white noise, you can group bits in any sort of fashion (bits, octets = bytes, multi-byte samples etc.), it will always give uniform probability distribution (and hence, entropy of ~1), irregardless of how you group the bits. For other (more correlated) signals, this won't work as well. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
On 18/07/2015, robert bristow-johnson wrote: > > listen, one thing i have to remind myself here (because if i don't, i'm > gonna get embarrassed) is to not underestimate either the level of > scholarship nor the level of practical experience doing things related > to music (or at least audio) and DSP. Failure to understand the difference between the concepts of 'entropy' and 'entropy rate' indicates something lacking in the scholarship level. Failure to understand that log2(p) != 0 when p != 1 may also indicate something lacking in the scholarship level (or some conceptual misunderstanding). The notion of 'entropy' is already well-defined in the literature, so it is not my task to define it, it is better to point out to the references in the literature. > listen, i understand LPC. Great! I never said or thought or implied that you don't. I merely noticed and expressed that a bitflip counter is equivalent to an error estimator for a subset of linear predictors (namely, binary delta coding), without implying anything about your knowledge or understanding. (And for that matter, I did not address that particular message to you personally, I just thought this particular topic could be interesting for *some* readers.) > i know how to do this stuff. i know how to teach this stuff and in a > previous lifetime have done it. Great. When did I say or imply you don't? LPC is actually a fairly basic concept. > so be careful, lest you lose people's interest, to restrain any > patronizing tone. we're not as dumb as you think. When did I say you're dumb? I merely pointed out to Ethan Duni, that "entropy" != "entropy rate". Failure to understand this will result in failure to understand that except in some rare corner cases, virtually all signals have entropy. That doesn't imply that he is dumb (I don't think he is dumb), that just implies that he doesn't understand the difference between entropy and entropy rate. Hence, I pointed out to a standard piece of literature that defines it, specifically because he asked me to define it. If you guys are eager enough to be patronizing and pointing out to the supposedly missing areas of my knowledge, then I'll also be eager to point out where your knowledge may be lacking. That's a fair deal, isn't it? After you Robert told me that I am a crackpot giving me a link to "Crackpot index" (I still remember - I don't forget), I think I was not even harsh (and in light of that, you sound hypocritical). Now let's get back to the topic. > so here's my question(s) ... please read carefully because i'm the guy > i can sorta see a generic measure of entropy doing that, but it assumes > messages that are adjacent bits. you can add to that messages that are > grouped together in any imagined fashion, not just adjacent bits. Exactly. That would give another correlation measure. > there are virtually uncountable ways of grouping together the bits of a > binary file to measure entropy. i simply cannot believe you did it > every-which-way. Exactly. I did not do it in every possible way, though that would be interesting to test (in a given window). For finite number of bits, the number of combinations is finite, though for high N it will be certainly impractical to compute. I would expect, the more correlations you analyze, the more accurate measure you'll get. > so, because i am too lazy to look around at previous posts, what is the > *specific* way or ways you grouped together the bits into messages so > that you could count their frequency of occurrence and get a measure of > mean bits of information per message? First, let me point out that there's no universally good answer to this question, and in general, it depends on the message and the source material (and the length of the analysis window, amount of input, etc.). In my tests, I got fairly good results on *certain* test data by using a sliding window, essentially what you described (grouping adjacent bits and builiding a histogram). If I understand right, the frequency of m1m2m3...mN will give the relative probability (transition probability) of mN occurring after m1m2m3...mN-1. I got this idea from Shannon's paper, where he does the same for letters to analyze transition probabilities of English text. If the bits/symbols are grouped in a different fashion, then the occurrence of that pattern would indicate the relative probabilities of bits/symbols being in that particular postion relative to each other (I haven't tested this). In the tests that I'm currently doing, I find that what worked well for simple 1-bit signals, doesn't work well for multi-bit (audio) signals, indicating that a different measure is needed. As said, there are a zillion ways of analyzing possible correlations, so I cannot give you an universally good measure. In fact, that's what I'd like to find out myself as well. So this simple measure has very limited application, and I find that it doesn't work well for more complex signals. I have some furt
Re: [music-dsp] about entropy encoding
On 7/17/15 2:28 AM, Peter S wrote: Dear Ethan, You suggested me to be short and concise. My kind recommendation to you: 1) Read "A Mathematical Theory of Communication". 2) Try to understand Theorem 2. 3) Try to see, when p_i != 1, then H != 0. I hope this excercise will help you grasp this topic. listen, one thing i have to remind myself here (because if i don't, i'm gonna get embarrassed) is to not underestimate either the level of scholarship nor the level of practical experience doing things related to music (or at least audio) and DSP. just a friendly caution to consider the same. On 7/17/15 3:27 PM, Peter S wrote: A linear predictor[1] tries to "predict" the next sample as the linear combination of previous samples as x'[n] = SUM [i=1..k] a_i * x[n-i] where x'[n] is the predicted sample, and a_1, a_2 ... a_k are the prediction coefficients (weights). This is often called linear predictive coding (LPC), listen, i understand LPC. i also understand (i remember having multiple email exchanges with Claude Cellier about the topic in the '90s) how to use LPC to reduce the word size (if the LPC is effective, the LPC guess for the following sample is, statistically-speaking, usually close to the actual sample, so the word size is smaller and the probability on each of those reduced-width words is higher) followed by Huffman coding on those often used delta values. if you have a non-flat spectrum, LPC will make it flat, and if you have a non-uniform p.d.f., Huffman coding will similarly take advantage of that. you can combine the two and tweek it to the point that the losslessly compressed file can't get much more compressed without adding more to the model (like periodicity, so that the LPC doesn't just use the most current N samples to predict the future sample, but also uses a bunch of samples that are P samples prior, where P is the best-fit period for a quasi-periodic input). i know how to do that (and done MATLAB implementations, but never anything that saw the commercial light-of-day). i also know how i might do it for music that is the combination of a few sources, if i get good pitch-detection data from a source separation alg, which is another big, difficult problem. i know how to do this stuff. i know how to teach this stuff and in a previous lifetime have done it. (i like to use a recurring character of George Carlin: Al Sleet, the Hippy-Dippy Weatherman, to illustrate Shannon's use of message probability, or frequency-of-occurance as a natural measure of the true information content a message has.) so be careful, lest you lose people's interest, to restrain any patronizing tone. we're not as dumb as you think. On 7/17/15 1:38 PM, Peter S wrote: On 17/07/2015, robert bristow-johnson wrote: On 7/17/15 1:26 AM, Peter S wrote: On 17/07/2015, robert bristow-johnson wrote: in your model, is one sample (from the DSP semantic) the same as a "message" (from the Information Theory semantic)? A "message" can be anything - it can be a sample, a bit, a combination of samples or bits, a set of parameters representing a square wave, whatever. doesn't answer my question. It does, it just depends on the model. In the 1-bit square wave duty cycle estimator, a message is a "bit". In the English text compression experiment, a message is "a character". In the white noise entropy estimation experiment, a message is a "byte". In the binary waveform entropy experiment, a message is "a string of bits". In the bitflip counter, a message is an event that "two consecutive bits differ" In the parametric squarewave thought experiment, message is "a set of parameteres describing a square wave". Whatever arbitrarily chosen thing I send to you over a channel, becomes a "message". There's no universal definition of what a "message" is, depending on the particular model, it can be literally anything. so here's my question(s) ... please read carefully because i'm the guy who gets to decide if my question really got answered, and i am still pretty much clueless about it: what is the atom of information that you are counting to create a histogram? maybe it's sophisticated, first you count every 1-bit message and create a two-column histogram. then rewind the soundfile and count every 2-bit message, grouped adjacently and add that to the histogram. then rewind the soundfile again and count each 3-bit message, and add that to the histogram. continue to some defined word-width limit. i can sorta see a generic measure of entropy doing that, but it assumes messages that are adjacent bits. you can add to that messages that are grouped together in any imagined fashion, not just adjacent bits. there are virtually uncountable ways of grouping together the bits of a binary file to measure entropy. i simply cannot believe you did it every-which-way. so, because i am too lazy to look around at previous posts, what is the *specific* way or
Re: [music-dsp] about entropy encoding
I tested a simple, first-order histogram-based entropy estimate idea on various 8-bit signed waveforms (message=sample, no correlations analyzed). Only trivial (non-bandlimited) waveforms were analyzed. Method: 1) Signal is trivially turned into a histogram. 2) Probabilities assumed based on histogram frequencies. 3) Entropy rate of the probability distribution is calculated. Detailed results: - http://morpheus.spectralhead.com/entropy/histogram/ Observations: - White noise: - as window length increases, estimate converges to 1. - as amplitude decreases, estimate converges to ~0.2. Square wave: - estimate is about 0.125, irregardless of frequency or amplitude. Saw wave: - as frequency increases, converges to ~0.2. - as frequency decreases, converges to 1. Sine wave: - as frequency increases, converges to ~0.17. - as frequency decreases, converges to ~0.95. - as amplitude decreases, converges to ~0.2 Constant signal: - estimate is always zero. Conclusions: As expected, being first-order, overall a poor estimator. For some corner cases (white noise, constant signal), gives the correct result. For other, periodic signals, gives an estimate with varying amounts of error. Since it doesn't analyze correlations, it cannot distinguish low frequency saw wave from white noise. Gives a rather poor estimate for most low frequency periodic waveforms, does somewhat better for high frequencies. -P -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
A linear predictor[1] tries to "predict" the next sample as the linear combination of previous samples as x'[n] = SUM [i=1..k] a_i * x[n-i] where x'[n] is the predicted sample, and a_1, a_2 ... a_k are the prediction coefficients (weights). This is often called linear predictive coding (LPC), and one typical application is to encode the formants in telephone telecommunications using source-filter model of speech[2]. Another typical application is waveform encoding in lossless wave compression codecs[3] like Shorten, FLAC, WavPack, MPEG-4 ALS, etc. The simplest possible form of a linear predictor is when a_1 = 1, and all other coefficients a_2, a_3, ... a_n = 0. So it simply predicts the next sample as x'[n] = x[n-1] In other words, it predicts the next sample to be the same as the previous one. This special case of linear prediction is called "delta encoding[4]", where the next sample is predicted to be the same, and only the difference between two consecutive samples are encoded. One compressed audio file format that uses this type of encoding is the Amiga IFF 8SVX file format[5]. If we simplify this delta encoding further and apply it to the smallest finite field GF(2), then we get: b'[n] = b[n-1] where b[n] is the Nth bit, and b'[n] is the prediction for the Nth bit. In this model, if we try to estimate the prediction error, then - since GF(2) only has two elements: 0 and 1 - the difference (delta) between two consecutive bits can be either 0 or 1. Since the addition operation in GF(2) is XOR, the difference (delta) between two consecutive bits equals A XOR B, that gives 1 for 0->1 or 1->0 transition, and gives zero for two identical bits. To calculate the total prediction error, we simply need to sum the total number of binary transitions in the binary string: E = SUM [i=2..N] ( b[i] XOR b[i-1] ) where N is the total number of bits, and E is the total error between the prediction and the actual samples. So an algorithm that counts bitflips in a binary string, can be thought of as the sum of the errors of the simplest possible linear predictor over GF(2) with a_1 = 1 and a_2, a_3, ... a_n = 0, that simply "predicts" that the next bit will be the same as the previous bit. By counting the binary transitions, we get the sum of 'deltas', or the sum of 'prediction error' of binary delta coding. If we think of entropy as 'unpredictability' and we expect it to (possibly nonlinearly) correlate with the sum of errors between a prediction and actual samples, then this simplistic linear predictor can be thought of as a very simplistic approximation to entropy. Of course, this is an extremely simplistic and dumb predictor - it is the simplest possible linear predictor that can possibly exist, as GF(2) is the smallest finite field, and it only has a single coefficient a_1 = 1. It cannot account for periodicity, previous bits, statistical probabilities, or other correlations. It can be thought of as an excercise in trying to create the "simplest possible predictor". Despite this, and however simple this predictor may seem, by extending this idea further I could successfully create a lossless PCM wave compression algorithm with compression ratios comparable to that of codecs using more complex linear predictors, like FLAC or Shorten. (More on this later.) Best regards, Peter References: [1] https://en.wikipedia.org/wiki/Linear_prediction [2] https://en.wikipedia.org/wiki/Source%E2%80%93filter_model_of_speech_production [3] https://en.wikipedia.org/wiki/Linear_predictive_coding#Applications [4] https://en.wikipedia.org/wiki/Delta_encoding [5] https://en.wikipedia.org/wiki/8SVX [6] https://en.wikipedia.org/wiki/GF%282%29 -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
On 17/07/2015, robert bristow-johnson wrote: > On 7/17/15 1:26 AM, Peter S wrote: >> On 17/07/2015, robert bristow-johnson wrote: >>> in your model, is one sample (from the DSP semantic) the same as a >>> "message" (from the Information Theory semantic)? >> A "message" can be anything - it can be a sample, a bit, a combination >> of samples or bits, a set of parameters representing a square wave, >> whatever. > > doesn't answer my question. It does, it just depends on the model. In the 1-bit square wave duty cycle estimator, a message is a "bit". In the English text compression experiment, a message is "a character". In the white noise entropy estimation experiment, a message is a "byte". In the binary waveform entropy experiment, a message is "a string of bits". In the bitflip counter, a message is an event that "two consecutive bits differ" In the parametric squarewave thought experiment, message is "a set of parameteres describing a square wave". Whatever arbitrarily chosen thing I send to you over a channel, becomes a "message". There's no universal definition of what a "message" is, depending on the particular model, it can be literally anything. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] about entropy encoding
On 7/17/15 1:26 AM, Peter S wrote: On 17/07/2015, robert bristow-johnson wrote: in your model, is one sample (from the DSP semantic) the same as a "message" (from the Information Theory semantic)? A "message" can be anything - it can be a sample, a bit, a combination of samples or bits, a set of parameters representing a square wave, whatever. doesn't answer my question. is your measure the same as SUM{ prob(message) * -log2(prob(message)) } all messages ? Yes. In this particular example, a "message" means "a set of parameters that describe a parametric square wave". To distinguish between two or more messages, at least 1 bit is needed. and this question doesn't get answered until the previous one does. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp