Re: [music-dsp] about entropy encoding

Ethan Duni Thu, 16 Jul 2015 16:07:55 -0700

>False. The better estimators give an estimate that approaches zero.
>
>% set pattern [randbits [randnum 20]]; puts pattern=$pattern; for {set i
1} {$i<
>=10} {incr i} {put "L=$i, "; measure [repeat $pattern 10000] $i}
>pattern=1000110011011010
>L=1, Estimated entropy per bit: 1.000000
>L=2, Estimated entropy per bit: 1.023263
>L=3, Estimated entropy per bit: 0.843542
>L=4, Estimated entropy per bit: 0.615876
>L=5, Estimated entropy per bit: 0.337507
>L=6, Estimated entropy per bit: 0.166667
>L=7, Estimated entropy per bit: 0.071429
>L=8, Estimated entropy per bit: 0.031250
>L=9, Estimated entropy per bit: 0.013889
>L=10, Estimated entropy per bit: 0.006250

What are these better estimators? It seems that you have several estimators
in mind but I can't keep track of what they all are, apart from the
bit-flip counter one. I urge you to slow down, collect your thoughts, and
spend a bit more time editing your posts for clarity (and length).

 And what is "entropy per bit?" Entropy is measured in bits, in the first
place. Did you mean "entropy per symbol" or something?

>I have an organ called "brain" that can extrapolate from data, and
>make predictions.

Maybe you could try using this "brain" to interact in a good-faith way.

>Entropy != entropy rate. The shape of the waveform itself has some entropy.
>Again, entropy != entropy rate. I wrote "entropy" not "entropy rate".

The "entropy" of a signal - as opposed to entropy rate - is not a
well-defined quantity, generally speaking. The standard quantity of
interest in the signal context is entropy rate, and so - in the interest of
good-faith engagement - I have assumed that is what you were referring to.
If you want to talk about "signal entropy," distinct from the entropy rate,
then you need to do some additional work to specify what you mean by that.
I.e., you need to set up a distribution over the space of possible signals.
Switching back and forth between talking about narrow parametric classes of
signals (like rectangular waves) and general signals is going to lead to
confusion, and it is on you to keep track of that and write clearly.

Moreover, the (marginal) entropy of (classes of) signals is generally not
an interesting quantity in the first place - hence the emphasis on entropy
rate. Signals of practical interest for telecommunications are all going to
have infinite "entropy," generally speaking, because they are infinitely
long and keep producing new surprises. Hence the emphasis on entropy rate,
which will distinguish between signals that require a lot of bandwidth to
transmit and those that don't.

>But not zero _entropy_. The parameters (amplitude, phase, frequency,
>waveform shape) themselves have some entropy - you *do* need to
>transmit those parameters to be able to reconstruct the signal.

Again, that all depends on assumptions on the distribution of the
parameters. You haven't specified anything like that, so these assertions
are not even wrong. They're simply begging the question of what is your
signal space and what is the distribution on it.

>I thought you have a problem with getting a nonzero estimate, as you
>said that an estimator that gives nonzero result for a periodic
>waveform is a "poor" estimator.

Again, the important distinction is between estimators that don't hit the
correct answer even theoretically (this indicates that the underlying
algorithm is inadequate) and the inevitable imperfections of an actual
numerical implementation (which can be made as small as one likes by
throwing more computational resources at them). These are quite different
issues - the latter is a matter of trading off resource constraints for
performance, the former is a fundamental algorithmic limitation that no
amount of implementation resources can ever address.

>Let's imagine you want to transmit a square wave with amplitude=100,
>phase=30, frequency=1000, and duty cycle=75%.
>Question: what is the 'entropy' of this square wave?

What is the distribution over the space of those parameters? The entropy is
a function of that distribution, so without specifying it there's no
answer.

Said another way: if that's the only signal I want to transmit, then I'll
just build those parameters into the receiver and not bother transmitting
anything. The entropy will be zero. The receiver will simply output the
desired waveform without any help from the other end. It's only if there's
some larger class of signals containing this one that I want to handle,
that the question of entropy comes up.

>a square wave has _nonzero_
>entropy, for all possible sets of parameters.

Again, this is begging the question. "A square wave" is not a distribution,
and so doesn't have "entropy." You need to specify what is the possible set
of parameters, and then specify a distribution over that set, in order to
talk about the entropy.

>If you assume the entropy _rate_ to be the average entropy per bits

What is "per bits?" You mean "per symbol" or something?

The definition of entropy rate is the limit of the conditional entropy
H(X_n|X_{n-1},X_{n-2},...) as n goes to infinity.

The average rate that some actual communication system uses to send some
particular signal is not the same thing as the entropy rate. The entropy
rate is a *lower bound* on that number. To calculate the entropy rate, you
need to figure out the conditional distribution of a given sample
conditioned on all previous samples, and then look at how the entropy of
that conditional distribution behaves asymptotically. In signal processing
terms, that corresponds to building an ideal signal predictor (with
potentially infinite memory, complexity, etc.) and then looking at the
entropy of the residual. The average rate produced by some actual coding
system is an *upper bound* on the entropy rate of the random process in
question.

Again, I encourage you to slow the pace of your replies and instead try to
write fewer, more concise posts with greater emphasis on clarity and
precision. You were doing okay earlier in this thread but seem to be
getting into muddier and muddier waters as it proceeds, and the resulting
confusion seems to be provoking some unpleasantly combative behavior from
you.

E

On Thu, Jul 16, 2015 at 12:50 PM, Peter S <peter.schoffhau...@gmail.com>
wrote:

> On 16/07/2015, Ethan Duni <ethan.d...@gmail.com> wrote:
> > But, it seems that it does *not* approach zero. If you fed an arbitrarily
> > long periodic waveform into this estimator, you won't see the estimate
> > approaching zero as you increase the length.
>
> False. The better estimators give an estimate that approaches zero.
>
> % set pattern [randbits [randnum 20]]; puts pattern=$pattern; for {set i
> 1} {$i<
> =10} {incr i} {put "L=$i, "; measure [repeat $pattern 10000] $i}
> pattern=1000110011011010
> L=1, Estimated entropy per bit: 1.000000
> L=2, Estimated entropy per bit: 1.023263
> L=3, Estimated entropy per bit: 0.843542
> L=4, Estimated entropy per bit: 0.615876
> L=5, Estimated entropy per bit: 0.337507
> L=6, Estimated entropy per bit: 0.166667
> L=7, Estimated entropy per bit: 0.071429
> L=8, Estimated entropy per bit: 0.031250
> L=9, Estimated entropy per bit: 0.013889
> L=10, Estimated entropy per bit: 0.006250
>
> It seems that the series approaches zero with increasing length.
> I can repeat it arbitrary times with an arbitrary repeated waveform:
> http://morpheus.spectralhead.com/entropy/random-pattern-tests.txt
>
> Longer patterns (up to cycle length 100):
> http://morpheus.spectralhead.com/entropy/random-pattern-tests-100.txt
>
> For comparison, same tests repeated on white noise:
> http://morpheus.spectralhead.com/entropy/white-noise-tests.txt
>
> The numbers do not lie.
>
> > Also you are only able to deal
> > with 1-bit waveforms, I don't see how you can make any claims about
> general
> > periodic waveforms with this.
>
> I have an organ called "brain" that can extrapolate from data, and
> make predictions.
>
> >>"Random" periodic shapes give somewhat higher entropy estimate than
> "pure"
> >>waves like square, so there's somewhat more entropy in it.
> >
> > No, all periodic signals have exactly zero entropy rate.
>
> Entropy != entropy rate. The shape of the waveform itself has some entropy.
>
> > The correct statement is that your estimator does an even worse job on
> > complicated periodic waveforms, than it does on ones like square waves.
> > This is because it's counting transitions, and not actually looking for
> > periodicity.
>
> The bitflip estimator, yes. The pattern matching estimator, no.
>
> > Again, periodic waveforms have exactly zero entropy rate.
>
> Again, entropy != entropy rate. I wrote "entropy" not "entropy rate".
> The shape of the waveform itself contains some information, doesn't
> it? To be able to transmit a waveform of arbitrary shape, you need to
> transmit the shape somehow at least *once*, don't you think? Otherwise
> you cannot reconstruct it... Hence, any periodic waveform has nonzero
> _entropy_. The more complicated the waveform shape is, the more
> entropy it has.
>
> > If there is no randomness, then there is no entropy.
>
> Entirely false. The entropy of the waveform shape is nonzero.
> Do not confuse "entropy" with "entropy rate".
>
> Even a constant signal has "entropy". Unless it's zero, you need to
> transmit the constant value at least *once* to be able to reconstruct
> it. Otherwise, how do you know what to reconstruct, if you transmit
> *nothing*?
>
> > This is the reason that parameter estimation is coming up - if a
> > signal can be described by a finite set of parameters (amplitude, phase
> and
> > frequency, say) then it immediately follows that it has zero entropy
> rate.
>
> But not zero _entropy_. The parameters (amplitude, phase, frequency,
> waveform shape) themselves have some entropy - you *do* need to
> transmit those parameters to be able to reconstruct the signal. Which
> definitely do have _nonzero_ entropy. If you do not transmit anything
> (= zero entropy), you cannot reconstruct the waveform. Hence, any
> waveform has nonzero entropy, and "entropy" != "entropy rate". If you
> did not receive anything at the receiver (zero entropy), what will you
> reconstruct?
>
> > Your estimator is itself a crude predictor:
>
> Which estimator? I made at least 4 different estimators so far, some
> more crude, some more sophisticated.
>
> > But since it is such a crude predictor it can't account for
> > obvious, deterministic behavior such as periodic square waves.
>
> The better estimators can.
>
> > Depends on the audio codec. High performance ones actually don't do that,
> > for a couple of reasons. One is that general audio signals don't really
> > have a reliable time structure to exploit, so you're actually better off
> > exploiting psychoacoustics.
>
> Are you speaking about lossy, or lossless codecs?
> Lossless codecs like FLAC and WavPack certainly do that.
>
> > Another is that introducing such
> > dependencies means that any packet loss/corruption results in error
> > propagation.
>
> And how is error propagation from "packet corruption" anyhow relevant
> to entropy? That is an entirely unrelated property of codecs. (And
> besides, they usually work on frames and include a CRC32 or MD5 hash
> for the frame, so if a frame is corrupted, you can trivially
> retransmit that frame.)
>
> > Signal predictors are a significant part of speech codecs
> > (where the fact that you're dealing with human speech signals gives you
> > some reliable assumptions to exploit) and in low performance audio codecs
> > (ADPCM)
>
> And high performance audio codecs as well, see FLAC, WavPack, etc.
>
> > where the overall system is simple enough that introducing some
> > adaptive prediction doesn't cause too many troubles.
>
> Actually these codecs use "simple" prediction systems.
> (Usually some simple adaptive linear predictor.)
>
> > Getting to *exactly* zero is kind of nit-picky. An estimation error on
> the
> > order of, say, 10^-5 bits/second is as good as zero, since it's saying
> you
> > only have to send 1 bit every few hours.
>
> I thought you have a problem with getting a nonzero estimate, as you
> said that an estimator that gives nonzero result for a periodic
> waveform is a "poor" estimator.
>
> But as I suggested, you can hard-code an arbitrary threshold into your
> algorithm that says "close enough". You can put that threshold at
> 10^-5 bits/second, if that's your personal preference.
>
> > The reason the non-zero estimates you're getting from your estimator are
> a
> > problem is that they are artifacts of the estimation strategy - not the
> > numerical precision
>
> Actually, both. You cannot construct an estimator that is free of
> noise from numerical and sampling artifacts. Unless the period length
> is integer, two cycles will never be bit-identical, however good
> estimation strategy you make. Hence, you cannot determine with 100%
> certainity if that's the *same* cycle repeated, or a slightly
> different cycle, containing new information (and thus, entropy). How
> would you determine?
>
> -P
> --
> dupswapdrop -- the music-dsp mailing list and website:
> subscription info, FAQ, source code archive, list archive, book reviews,
> dsp links
> http://music.columbia.edu/cmc/music-dsp
> http://music.columbia.edu/mailman/listinfo/music-dsp
>
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] about entropy encoding

Reply via email to