Re: [music-dsp] about entropy encoding

Peter S Thu, 16 Jul 2015 20:39:28 -0700

On 17/07/2015, Ethan Duni <ethan.d...@gmail.com> wrote:
> What are these better estimators? It seems that you have several estimators
> in mind but I can't keep track of what they all are,
> I urge you to slow down, collect your thoughts, and
> spend a bit more time editing your posts for clarity (and length).


I urge you to pay more attention and read more carefully.
I do not want to repeat myself several times.
(Others will think it's repetitive and boring.)

[And fuck the "spend more time" part, I already spent 30+ hours editing.]

> And what is "entropy per bit?" Entropy is measured in bits, in the first
> place. Did you mean "entropy per symbol" or something?

Are you implying that a bit is not a symbol?
A bit *is* a symbol. So of course, I meant that.

> Entropy is measured in bits, in the first place.

According to IEC 80000-13, entropy is measured in shannons:
https://en.wikipedia.org/wiki/Shannon_%28unit%29

For historical reasons, "bits" is often used synonymously with "shannons".

> Maybe you could try using this "brain" to interact in a good-faith way.

Faith belongs to church.

> The "entropy" of a signal - as opposed to entropy rate - is not a
> well-defined quantity, generally speaking.

It's exact value is not "well-defined", yet it is *certain* to be non-zero.
(Unless you only have only 1 particular signal with 100% probability.)

> The standard quantity of interest in the signal context is entropy rate

Another standard quality of interest in the signal context is "entropy".

https://en.wikipedia.org/wiki/Entropy_%28information_theory%29

Quote:
"Entropy is a measure of unpredictability of information content."

> If you want to talk about "signal entropy," distinct from the entropy rate,
> then you need to do some additional work to specify what you mean by that.

Let me give you an example.

You think that a constant signal has no randomness, thus no entropy (zero bits).
Let's do a little thought experiement:

I have a constant signal, that I want to transmit to you over some
noiseless discrete channel. Since you think a constant signal has zero
entropy, I send you _nothing_ (precisely zero bits).

Now try to reconstruct my constant signal from the "nothing" that you
received from me! Can you?

.
.
.

There's very high chance you can't. Let me give you a hint. My
constant signal is 16 bit signed PCM, and first sample of it is
sampled from uniform distribution noise.

What is the 'entropy' of my constant signal?

Answer: since the first sample is sampled from uniform distribution
noise, the probability of you successfully guessing my constant signal
is 1/65536. Hence, it has an entropy of log2(65536) = 16 bits. In
other words, I need to send you all the 16 bits of the first sample
for you to be able to reconstruct my constant signal with 100%
certainity. Without receiving those 16 bits, you cannot reconstruct my
constant signal with 100% certainity. That's the measure of its
"uncertainity" or "unpredictability".

So you (falsely) thought a "constant signal" has zero randomness and
thus zero entropy, yet it turns out that when I sampled that constant
signal from the output of 16-bit uniform distribution white noise,
then my constant signal will have 16 bits of entropy. And if I want to
transmit it to you, then I need to send you a minimum of 16 bits for
you to be able to reconstruct, despite that it's a "constant" signal.

It may have an asymptotic 'entropy rate' of zero, yet that doesn't
mean that the total entropy is zero. So the 'entropy rate' doesn't
tell you the entropy of the signal. The total entropy (uncertainity,
unpredictability, randomness) in this particular constant signal is 16
bits. Hence, its entropy is nonzero, and in this case, 16 bits. Hence,
if I want to send it to you in a message, I need to send a minimum of
16 bits to send this constant signal to you. The 'entropy rate'
doesn't tell you the full picture.

> Switching back and forth between talking about narrow parametric classes of
> signals (like rectangular waves) and general signals is going to lead to
> confusion, and it is on you to keep track of that and write clearly.

When I say "square wave", then I mean square wave.
When I say "arbitrary signal", then I mean arbitrary signal.
What's confusing about that? What makes you unable to follow?

> Moreover, the (marginal) entropy of (classes of) signals is generally not
> an interesting quantity in the first place [...]
> Hence the emphasis on entropy rate,
> which will distinguish between signals that require a lot of bandwidth to
> transmit and those that don't.

"Entropy" also tells you how much bandwith you require to transmit a
signal. It's _precisely_ what it tells you. It doesn't tell you "per
symbol", but rather tells you the "total" number of bits you minimally
need to transmit the signal fully. I don't understand why you just
focus on "entropy rate" (= asymptotic entropy per symbol), and forget
about the total entropy. Both measures tell you the same kind of
information, in a slightly different context.

>>But not zero _entropy_. The parameters (amplitude, phase, frequency,
>>waveform shape) themselves have some entropy - you *do* need to
>>transmit those parameters to be able to reconstruct the signal.
>
> Again, that all depends on assumptions on the distribution of the
> parameters. You haven't specified anything like that, so these assertions
> are not even wrong. They're simply begging the question of what is your
> signal space and what is the distribution on it.

I think you do not understand what I say. The distribution of
parameters is entirely irrelevant. I only said, the entropy is
*nonzero*. Unless your model can only transmit 1 single waveform with
a particular set of parameters (= 100% probability, zero entropy), it
will have a *nonzero* entropy. I cannot say what that value is, I just
say that unless you have 1 single set of parameters with 100%
probability, the entropy of the parameters will be certainly
*nonzero*. Without knowing or caring or needing to know what the
actual distribution is.

> Again, the important distinction is between estimators that don't hit the
> correct answer even theoretically (this indicates that the underlying
> algorithm is inadequate)

There is no practical algorithm that could "hit the correct answer
theoretically". If that is your measure of adequateness, then all
estimates will be 'inadequate'.

> and the inevitable imperfections of an actual
> numerical implementation (which can be made as small as one likes by
> throwing more computational resources at them).

Yet it will still be nonzero, however high precision and however much
computational resources you throw at it (which will be impractical by
the way).

>>Let's imagine you want to transmit a square wave with amplitude=100,
>>phase=30, frequency=1000, and duty cycle=75%.
>>Question: what is the 'entropy' of this square wave?
>
> What is the distribution over the space of those parameters? The entropy is
> a function of that distribution, so without specifying it there's no
> answer.

The actual distribution is entirely irrelevant here. What I am saying,
is that - unless the probability distribution is so that a square wave
with these parameters have 100% probability, i.e. your model can
*only* transmit this single particular square wave and no other square
waves, then the entropy will be *nonzero*. What it is exactly, I can't
tell, and I don't care. I'm just telling you that - other than a
single particular corner case - it is *certain* to be different from
zero.

> Said another way: if that's the only signal I want to transmit, then I'll
> just build those parameters into the receiver and not bother transmitting
> anything. The entropy will be zero. The receiver will simply output the
> desired waveform without any help from the other end.

Exactly. That's precisely what I am telling you. What I am telling
you, is that - without exception - in *all* other cases, the entropy
will be *nonzero* (without knowing or caring what it actually is). As
soon as you want to build a transmitter that can transmit 2 or more
sets of possible parameters where probability p_i != 1, then the total
entropy will be *nonzero*. If total entropy is defined as

    H = -K * SUM p_i * log2(p_i)     (Ref.: [Shannon1948])
        
then if for any i the probability p_i != 1, this sum will produce a
result that is different from zero. Hence, *nonzero* entropy.

If you only have a single p=1, then this equation will yield zero
entropy. Since the sum of all probabilities is 1, assume you have any
other nonzero probability of another set of parameters q != 0. Since p
+ q = 1 and thus p = 1-q, because q != 0, it follows that p != 1.
Hence, -log2(p) != 0, hence it follows from the above formula that the
entropy H != 0. I do not know what H is exactly, and I do not care, I
just know that if any p_i != 1, then H != 0.

Here is a plotted graph of -p*log2(p) as a function of p:

http://morpheus.spectralhead.com/img/log2_p.png

Trivially, the graph of -p*log2(p) is *nonzero* for any p != 1 (or p
!= 0). Without knowing p, I don't know what it is *exactly*, I only
know that it is *certain* to be nonzero, which is trivially visible if
you look at the graph.

> Again, this is begging the question. "A square wave" is not a distribution,
> and so doesn't have "entropy." You need to specify what is the possible set
> of parameters, and then specify a distribution over that set, in order to
> talk about the entropy.

False. I do not need to know the "exact" probabilities to know that
the entropy will be *nonzero*. "Square wave" in this context was meant
the set of parameters to represent the square wave using this model,
which has a probability distribution. I do not need to know the
_actual_ probability distribution to know that the entropy will be
*nonzero*, unless it's the very special corner case of 100%
probability of this particular square wave with this particular set of
parameteres, and 0% probability of *all* other square waves. Which is
a very, very, very, very unlikely scenario (and your transmitter will
be very very dumb and quite unusable, since it cannot transmit any
square wave other than this single one).

In other words, -p*log2(p) is certain to be nonzero for any p != 1
(which is the single corner case of 100% probability of this
particular square wave). Hence, it is certain that the entropy H = -K
* SUM p_i * log2(p_i) is nonzero, if *any* of the terms is nonzero
(which happens when any p_i != 1, which happens when you have any
other nonzero probability p_j != 0). Said otherwise, if your
transmitter is able to transmit at least *two* sets of parameters with
nonzero probability, then the entropy of both sets of parameters will
be bound to be nonzero. I don't know what they actually are, I just
know that they're nonzero.

Think of it as this - if your receiver can distinguish only two
different sets of parameters, then you need to send at least *one* bit
to distinguish between them - '0' meaning square wave "A", and '1'
meaning square wave "B". Without sending at least a *single* bit, your
receiver cannot distinguish between square waves A and B. Same for
more sets of parameters - I don't care or know how many bits you need
to minimally send to uniquely distinguish between them, I just know
that it is *certain* to be more than zero bits (at least *one* or
more, hence nonzero amount of bits, hence nonzero entropy).

>>If you assume the entropy _rate_ to be the average entropy per bits
>
> What is "per bits?" You mean "per symbol" or something?

Of course. A bit is a symbol. If you prefer, call it "entropy per
symbol" instead.

> The definition of entropy rate is the limit of the conditional entropy
> H(X_n|X_{n-1},X_{n-2},...) as n goes to infinity.

This particular lecture says it has multiple definitions:
http://www2.isye.gatech.edu/~yxie77/ece587/Lecture6.pdf

Definition 1: average entropy per symbol
H(X) = lim [n->inf] H(X^n)/n

Reference:
Dr. Yao Xie, ECE587, Information Theory, Duke University

If you disagree, please discuss it with Mr. Xie.

Definition 2 is the same as yours.

> You were doing okay earlier in this thread but seem to be
> getting into muddier and muddier waters as it proceeds, and the resulting
> confusion seems to be provoking some unpleasantly combative behavior from
> you.

I successfully made an algorithm that gives ~1 for white noise, and a
value approaching zero for periodic signals. And you're constantly
telling me that I am wrong (without even understanding what I say).
What behaviour did you expect? Try to look at the above formula, and
understand what it means.

-P

Ref.: Shannon, Claude E. (1948). "A Mathematical Theory of Communication".
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] about entropy encoding

Reply via email to