Re: Re[10]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

ajb Fri, 13 Jul 2007 01:11:50 -0700

G'day all.

Quoting Bulat Ziganshin <[EMAIL PROTECTED]>:


> it seems that you either misunderstood PPM or make too wide
> assumptions.

More likely I'm not explaining myself well.

> in classical fixed-order ppm each char probability
> found in context of previous N chars. i.e. when encoding abcdef..
> with order 3, 'd' encoded in context of 'abc', 'e' encoded in context
> of 'bcd' and so on

Let's suppose that "abcdef" is a context in PPM, and we're starting from
the "start" state (which usually isn't the case, of course).  Then to
encode it, you encode the probability of finding an "a" in the start state,
followed by the probability of finding a "b" in the "a" state, followed by
the probability of finding a "c" in the "ab" state...

The effect of encoding these multiple symbols is equivalent to the
effect of encoding a single symbol, whose probability is the individual
probabilities multiplied together.  Or, if you like, it's the conditional
probability of encoding a string starting with "abcdef" from the start
state, taken as a proportion of _all_ possible encoded strings.

    Pr("abcdef"|"") = Pr("a"|"") * Pr("b"|"a") * Pr("c"|"ab" ) * ...

LZW uses a simpler model, estimating Pr("abcdef"|S) as 1/N, where N is
the number of "states" seen so far.

Cheers,
Andrew Bromage
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re[10]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Reply via email to