Re: [music-dsp] about entropy encoding

2015-07-17 Thread Ethan Duni
Peter S, your combative attitude is unwelcome. It seems that you are less
interested in grasping these topics than you are in hectoring myself and
other list members. Given that and the dubious topicality of this thread,
this will be my last response to you. I hope that you find a healthy way to
address the source of your hostility, and also that you gain more insight
into Information Theory.

My apologies to the list for encouraging this unfortunate tangent.

E

On Thu, Jul 16, 2015 at 8:38 PM, Peter S peter.schoffhau...@gmail.com
wrote:

 On 17/07/2015, Ethan Duni ethan.d...@gmail.com wrote:
  What are these better estimators? It seems that you have several
 estimators
  in mind but I can't keep track of what they all are,
  I urge you to slow down, collect your thoughts, and
  spend a bit more time editing your posts for clarity (and length).

 I urge you to pay more attention and read more carefully.
 I do not want to repeat myself several times.
 (Others will think it's repetitive and boring.)

 [And fuck the spend more time part, I already spent 30+ hours editing.]

  And what is entropy per bit? Entropy is measured in bits, in the first
  place. Did you mean entropy per symbol or something?

 Are you implying that a bit is not a symbol?
 A bit *is* a symbol. So of course, I meant that.

  Entropy is measured in bits, in the first place.

 According to IEC 8-13, entropy is measured in shannons:
 https://en.wikipedia.org/wiki/Shannon_%28unit%29

 For historical reasons, bits is often used synonymously with shannons.

  Maybe you could try using this brain to interact in a good-faith way.

 Faith belongs to church.

  The entropy of a signal - as opposed to entropy rate - is not a
  well-defined quantity, generally speaking.

 It's exact value is not well-defined, yet it is *certain* to be non-zero.
 (Unless you only have only 1 particular signal with 100% probability.)

  The standard quantity of interest in the signal context is entropy rate

 Another standard quality of interest in the signal context is entropy.

 https://en.wikipedia.org/wiki/Entropy_%28information_theory%29

 Quote:
 Entropy is a measure of unpredictability of information content.

  If you want to talk about signal entropy, distinct from the entropy
 rate,
  then you need to do some additional work to specify what you mean by
 that.

 Let me give you an example.

 You think that a constant signal has no randomness, thus no entropy (zero
 bits).
 Let's do a little thought experiement:

 I have a constant signal, that I want to transmit to you over some
 noiseless discrete channel. Since you think a constant signal has zero
 entropy, I send you _nothing_ (precisely zero bits).

 Now try to reconstruct my constant signal from the nothing that you
 received from me! Can you?

 .
 .
 .

 There's very high chance you can't. Let me give you a hint. My
 constant signal is 16 bit signed PCM, and first sample of it is
 sampled from uniform distribution noise.

 What is the 'entropy' of my constant signal?

 Answer: since the first sample is sampled from uniform distribution
 noise, the probability of you successfully guessing my constant signal
 is 1/65536. Hence, it has an entropy of log2(65536) = 16 bits. In
 other words, I need to send you all the 16 bits of the first sample
 for you to be able to reconstruct my constant signal with 100%
 certainity. Without receiving those 16 bits, you cannot reconstruct my
 constant signal with 100% certainity. That's the measure of its
 uncertainity or unpredictability.

 So you (falsely) thought a constant signal has zero randomness and
 thus zero entropy, yet it turns out that when I sampled that constant
 signal from the output of 16-bit uniform distribution white noise,
 then my constant signal will have 16 bits of entropy. And if I want to
 transmit it to you, then I need to send you a minimum of 16 bits for
 you to be able to reconstruct, despite that it's a constant signal.

 It may have an asymptotic 'entropy rate' of zero, yet that doesn't
 mean that the total entropy is zero. So the 'entropy rate' doesn't
 tell you the entropy of the signal. The total entropy (uncertainity,
 unpredictability, randomness) in this particular constant signal is 16
 bits. Hence, its entropy is nonzero, and in this case, 16 bits. Hence,
 if I want to send it to you in a message, I need to send a minimum of
 16 bits to send this constant signal to you. The 'entropy rate'
 doesn't tell you the full picture.

  Switching back and forth between talking about narrow parametric classes
 of
  signals (like rectangular waves) and general signals is going to lead to
  confusion, and it is on you to keep track of that and write clearly.

 When I say square wave, then I mean square wave.
 When I say arbitrary signal, then I mean arbitrary signal.
 What's confusing about that? What makes you unable to follow?

  Moreover, the (marginal) entropy of (classes of) signals is generally not
  an interesting quantity in the 

Re: [music-dsp] about entropy encoding

2015-07-17 Thread Peter S
Dear Ethan,

You suggested me to be short and concise.
My kind recommendation to you:

1) Read A Mathematical Theory of Communication.
2) Try to understand Theorem 2.
3) Try to see, when p_i != 1, then H != 0.

I hope this excercise will help you grasp this topic.

Best regards,
Peter


On 17/07/2015, Ethan Duni ethan.d...@gmail.com wrote:
 Peter S, your combative attitude is unwelcome. It seems that you are less
 interested in grasping these topics than you are in hectoring myself and
 other list members. Given that and the dubious topicality of this thread,
 this will be my last response to you. I hope that you find a healthy way to
 address the source of your hostility, and also that you gain more insight
 into Information Theory.

 My apologies to the list for encouraging this unfortunate tangent.

 E
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-17 Thread Peter S
On 17/07/2015, robert bristow-johnson r...@audioimagination.com wrote:
 On 7/17/15 1:26 AM, Peter S wrote:
 On 17/07/2015, robert bristow-johnsonr...@audioimagination.com  wrote:
 in your model, is one sample (from the DSP semantic) the same as a
 message (from the Information Theory semantic)?
 A message can be anything - it can be a sample, a bit, a combination
 of samples or bits, a set of parameters representing a square wave,
 whatever.

 doesn't answer my question.

It does, it just depends on the model.

In the 1-bit square wave duty cycle estimator, a message is a bit.
In the English text compression experiment, a message is a character.
In the white noise entropy estimation experiment, a message is a byte.
In the binary waveform entropy experiment, a message is a string of bits.
In the bitflip counter, a message is an event that two consecutive bits differ
In the parametric squarewave thought experiment, message is a set of
parameteres describing a square wave.

Whatever arbitrarily chosen thing I send to you over a channel,
becomes a message. There's no universal definition of what a
message is, depending on the particular model, it can be literally
anything.
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-17 Thread robert bristow-johnson

On 7/17/15 1:26 AM, Peter S wrote:

On 17/07/2015, robert bristow-johnsonr...@audioimagination.com  wrote:

in your model, is one sample (from the DSP semantic) the same as a
message (from the Information Theory semantic)?

A message can be anything - it can be a sample, a bit, a combination
of samples or bits, a set of parameters representing a square wave,
whatever.


doesn't answer my question.


is your measure the same as

  SUM{ prob(message) * -log2(prob(message)) }
 all messages

?

Yes. In this particular example, a message means a set of
parameters that describe a parametric square wave. To distinguish
between two or more messages, at least 1 bit is needed.


and this question doesn't get answered until the previous one does.

--

r b-j  r...@audioimagination.com

Imagination is more important than knowledge.



--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-17 Thread Peter S
A linear predictor[1] tries to predict the next sample as the linear
combination of previous samples as

x'[n] = SUM [i=1..k] a_i * x[n-i]

where x'[n] is the predicted sample, and a_1, a_2 ... a_k are the
prediction coefficients (weights). This is often called linear
predictive coding (LPC), and one typical application is to encode the
formants in telephone telecommunications using source-filter model of
speech[2]. Another typical application is waveform encoding in
lossless wave compression codecs[3] like Shorten, FLAC, WavPack,
MPEG-4 ALS, etc.

The simplest possible form of a linear predictor is when a_1 = 1, and
all other coefficients a_2, a_3, ... a_n = 0. So it simply predicts
the next sample as

x'[n] = x[n-1]

In other words, it predicts the next sample to be the same as the previous one.

This special case of linear prediction is called delta encoding[4],
where the next sample is predicted to be the same, and only the
difference between two consecutive samples are encoded. One compressed
audio file format that uses this type of encoding is the Amiga IFF
8SVX file format[5].

If we simplify this delta encoding further and apply it to the
smallest finite field GF(2), then we get:

b'[n] = b[n-1]

where b[n] is the Nth bit, and b'[n] is the prediction for the Nth
bit. In this model, if we try to estimate the prediction error, then -
since GF(2) only has two elements: 0 and 1 - the difference (delta)
between two consecutive bits can be either 0 or 1.

Since the addition operation in GF(2) is XOR, the difference (delta)
between two consecutive bits equals A XOR B, that gives 1 for 0-1 or
1-0 transition, and gives zero for two identical bits. To calculate
the total prediction error, we simply need to sum the total number of
binary transitions in the binary string:

E = SUM [i=2..N] ( b[i] XOR b[i-1] )

where N is the total number of bits, and E is the total error between
the prediction and the actual samples.

So an algorithm that counts bitflips in a binary string, can be
thought of as the sum of the errors of the simplest possible linear
predictor over GF(2) with a_1 = 1 and a_2, a_3, ... a_n = 0, that
simply predicts that the next bit will be the same as the previous
bit. By counting the binary transitions, we get the sum of 'deltas',
or the sum of 'prediction error' of binary delta coding.

If we think of entropy as 'unpredictability' and we expect it to
(possibly nonlinearly) correlate with the sum of errors between a
prediction and actual samples, then this simplistic linear predictor
can be thought of as a very simplistic approximation to entropy.

Of course, this is an extremely simplistic and dumb predictor - it is
the simplest possible linear predictor that can possibly exist, as
GF(2) is the smallest finite field, and it only has a single
coefficient a_1 = 1. It cannot account for periodicity, previous bits,
statistical probabilities, or other correlations. It can be thought of
as an excercise in trying to create the simplest possible predictor.

Despite this, and however simple this predictor may seem, by extending
this idea further I could successfully create a lossless PCM wave
compression algorithm with compression ratios comparable to that of
codecs using more complex linear predictors, like FLAC or Shorten.

(More on this later.)

Best regards,
Peter

References:
[1] https://en.wikipedia.org/wiki/Linear_prediction
[2] 
https://en.wikipedia.org/wiki/Source%E2%80%93filter_model_of_speech_production
[3] https://en.wikipedia.org/wiki/Linear_predictive_coding#Applications
[4] https://en.wikipedia.org/wiki/Delta_encoding
[5] https://en.wikipedia.org/wiki/8SVX
[6] https://en.wikipedia.org/wiki/GF%282%29
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-17 Thread robert bristow-johnson

On 7/17/15 2:28 AM, Peter S wrote:

Dear Ethan,

You suggested me to be short and concise.
My kind recommendation to you:

1) Read A Mathematical Theory of Communication.
2) Try to understand Theorem 2.
3) Try to see, when p_i != 1, then H != 0.

I hope this excercise will help you grasp this topic.


listen, one thing i have to remind myself here (because if i don't, i'm 
gonna get embarrassed) is to not underestimate either the level of 
scholarship nor the level of practical experience doing things related 
to music (or at least audio) and DSP.  just a friendly caution to 
consider the same.



On 7/17/15 3:27 PM, Peter S wrote:

A linear predictor[1] tries to predict the next sample as the linear
combination of previous samples as

 x'[n] = SUM [i=1..k] a_i * x[n-i]

where x'[n] is the predicted sample, and a_1, a_2 ... a_k are the
prediction coefficients (weights). This is often called linear
predictive coding (LPC),


listen, i understand LPC.  i also understand (i remember having multiple 
email exchanges with Claude Cellier about the topic in the '90s) how to 
use LPC to reduce the word size (if the LPC is effective, the LPC guess 
for the following sample is, statistically-speaking, usually close to 
the actual sample, so the word size is smaller and the probability on 
each of those reduced-width words is higher) followed by Huffman coding 
on those often used delta values.  if you have a non-flat spectrum, LPC 
will make it flat, and if you have a non-uniform p.d.f., Huffman coding 
will similarly take advantage of that.  you can combine the two and 
tweek it to the point that the losslessly compressed file can't get much 
more compressed without adding more to the model (like periodicity, so 
that the LPC doesn't just use the most current N samples to predict the 
future sample, but also uses a bunch of samples that are P samples 
prior, where P is the best-fit period for a quasi-periodic input).  i 
know how to do that (and done MATLAB implementations, but never anything 
that saw the commercial light-of-day).


i also know how i might do it for music that is the combination of a few 
sources, if i get good pitch-detection data from a source separation 
alg, which is another big, difficult problem.


i know how to do this stuff.  i know how to teach this stuff and in a 
previous lifetime have done it.  (i like to use a recurring character of 
George Carlin: Al Sleet, the Hippy-Dippy Weatherman, to illustrate 
Shannon's use of message probability, or frequency-of-occurance as a 
natural measure of the true information content a message has.)


so be careful, lest you lose people's interest, to restrain any 
patronizing tone.  we're not as dumb as you think.


On 7/17/15 1:38 PM, Peter S wrote:

On 17/07/2015, robert bristow-johnsonr...@audioimagination.com  wrote:

On 7/17/15 1:26 AM, Peter S wrote:

On 17/07/2015, robert bristow-johnsonr...@audioimagination.com   wrote:

in your model, is one sample (from the DSP semantic) the same as a
message (from the Information Theory semantic)?

A message can be anything - it can be a sample, a bit, a combination
of samples or bits, a set of parameters representing a square wave,
whatever.

doesn't answer my question.

It does, it just depends on the model.

In the 1-bit square wave duty cycle estimator, a message is a bit.
In the English text compression experiment, a message is a character.
In the white noise entropy estimation experiment, a message is a byte.
In the binary waveform entropy experiment, a message is a string of bits.
In the bitflip counter, a message is an event that two consecutive bits differ
In the parametric squarewave thought experiment, message is a set of
parameteres describing a square wave.

Whatever arbitrarily chosen thing I send to you over a channel,
becomes a message. There's no universal definition of what a
message is, depending on the particular model, it can be literally
anything.


so here's my question(s) ... please read carefully because i'm the guy 
who gets to decide if my question really got answered, and i am still 
pretty much clueless about it:


what is the atom of information that you are counting to create a histogram?

maybe it's sophisticated, first you count every 1-bit message and create 
a two-column histogram.


then rewind the soundfile and count every 2-bit message, grouped 
adjacently and add that to the histogram.


then rewind the soundfile again and count each 3-bit message, and add 
that to the histogram.


continue to some defined word-width limit.

i can sorta see a generic measure of entropy doing that, but it assumes 
messages that are adjacent bits.  you can add to that messages that are 
grouped together in any imagined fashion, not just adjacent bits.


there are virtually uncountable ways of grouping together the bits of a 
binary file to measure entropy.  i simply cannot believe you did it 
every-which-way.


so, because i am too lazy to look around at previous posts, 

Re: [music-dsp] about entropy encoding

2015-07-17 Thread Peter S
I tested a simple, first-order histogram-based entropy estimate idea
on various 8-bit signed waveforms (message=sample, no correlations
analyzed). Only trivial (non-bandlimited) waveforms were analyzed.

Method:
1) Signal is trivially turned into a histogram.
2) Probabilities assumed based on histogram frequencies.
3) Entropy rate of the probability distribution is calculated.

Detailed results:
-
http://morpheus.spectralhead.com/entropy/histogram/

Observations:
-
White noise:
- as window length increases, estimate converges to 1.
- as amplitude decreases, estimate converges to ~0.2.

Square wave:
- estimate is about 0.125, irregardless of frequency or amplitude.

Saw wave:
- as frequency increases, converges to ~0.2.
- as frequency decreases, converges to 1.

Sine wave:
- as frequency increases, converges to ~0.17.
- as frequency decreases, converges to ~0.95.
- as amplitude decreases, converges to ~0.2

Constant signal:
- estimate is always zero.

Conclusions:

As expected, being first-order, overall a poor estimator. For some
corner cases (white noise, constant signal), gives the correct result.
For other, periodic signals, gives an estimate with varying amounts of
error. Since it doesn't analyze correlations, it cannot distinguish
low frequency saw wave from white noise. Gives a rather poor estimate
for most low frequency periodic waveforms, does somewhat better for
high frequencies.

-P
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp