Re: [Jprogramming] zlib huffman coding

bill lam Wed, 10 Sep 2014 20:10:43 -0700

I guess the description of the algorithm itself is correct, but
it does not mention how to get the bits of each symbol. The bit
lengths for classic and zlib huffman can be different.


for testing, the bits for A from libz.so are

   ":&>_10<\bits
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 11 11 10 10 10 10 11
11 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 10 2 0 0 0 0
0 0 0 0 0 1

this is the huffman dictionary stored in zlib stream and the actual 
huffman codes are computed using the algorithm mentioned during decoding.

Ср, 10 сен 2014, Raul Miller написал(а):
> I agree that there should not be any 2s.
> 
> However, I think this means that there's something wrong with the
> description of the algorithm, or at least my understanding of it.
> 
> I'll need some time to digest this.
> 
> Thanks,
> 
> -- 
> Raul
> 
> On Wed, Sep 10, 2014 at 9:39 PM, bill lam <[email protected]> wrote:
> > pardon me of forgetting telling another cavaet.
> > zlib huffman code is suboptimal so that bit length of code
> > for each symbol can be longer than that in classic huffman.
> > eg in my orignal data.
> >
> >    10{. ":@>F deflatecodes2 A
> > 1 1 0 0 0 0 0 0 0 0
> > 1 1 1 1 1 1 1 1 1 1 0
> > 1 1 1 1 1 1 1 1 1 1 1
> > 1 1 1 1 1 1 1 1 1 2 0
> > 1 1 1 1 1 1 1 1 1 2 1
> > 1 1 1 1 1 1 1 1 2 1 0
> > 1 1 0 0 0 0 0 0 0 1
> > 1 1 0 0 0 0 0 0 1 0
> > 1 1 0 0 0 0 0 0 1 1
> > 1 1 0 0 0 0 0 1 0 0
> >
> > there shouldn't be any 2's.  It needs to increase bit length of
> > some code.
> >
> > Ср, 10 сен 2014, Raul Miller написал(а):
> >> Oops, sorry about that.
> >>
> >> Try it this way:
> >>
> >> deflatecodes2=:4 :0
> >>   L=. #@> x hcodes y
> >>   U=. 0,~.L
> >>   R=. ;<@(({. >./@(>#])&U #1:)@{. <@:+"1-@{.{."1 #:@i.@#)/.~L
> >>   R/:;(</. i.@#)L
> >> )
> >>
> >>    ":@>F1 deflatecodes2 A1
> >> 0
> >> 1 1 0
> >> 1 1 1 1 1 1 0
> >> 1 1 1 1 1 1 1
> >> 1 1 1 0 0 0
> >> 1 1 1 0 0 1
> >> 1 1 1 0 1 0
> >> 1 1 1 0 1 1
> >> 1 1 1 1 0 0
> >> 1 1 1 1 0 1
> >> 1 1 1 1 1 0
> >> 1 0
> >>
> >> Thanks,
> >>
> >> --
> >> Raul
> >>
> >>
> >> On Wed, Sep 10, 2014 at 7:26 PM, bill lam <[email protected]> wrote:
> >> > for zlib, all huffman code of the same bit length are in sequential,
> >> > starting with a code an 0 bit in the end (even number), and no gaps in 
> >> > the
> >> > block (consecutive). the symbols of these block represented are 
> >> > sequential
> >> > but not consecutive (possibly gaps). So the zlib huffman code is slightly
> >> > less efficient than the original huffman code but the advantage is 
> >> > simpler
> >> > to store the table, just the bits used. section 3.2.6 gives an example of
> >> > such table.
> >> >
> >> > deflatecodes can satisfy the 2 rules _but_ its result is invalid because
> >> > huffman code is a prefix coding.
> >> >
> >> >    ,.F1 ( deflatecodes)  A1
> >> > ┌─────────────┐
> >> > │0            │
> >> > ├─────────────┤
> >> > │1 1 0        │
> >> > ├─────────────┤
> >> > │1 1 1 0 0 1 0│
> >> > ├─────────────┤
> >> > │1 1 1 0 0 1 1│
> >> > ├─────────────┤
> >> > │1 1 1 0 0 0  │
> >> > ├─────────────┤
> >> > │1 1 1 0 0 1  │
> >> > ├─────────────┤
> >> > │1 1 1 0 1 0  │
> >> > ├─────────────┤
> >> > │1 1 1 0 1 1  │
> >> > ├─────────────┤
> >> > │1 1 1 1 0 0  │
> >> > ├─────────────┤
> >> > │1 1 1 1 0 1  │
> >> > ├─────────────┤
> >> > │1 1 1 1 1 0  │
> >> > ├─────────────┤
> >> > │1 0          │
> >> > └─────────────┘
> >> >   7 bit 1110010 is a prefix of 6 bit 111001 and this is illegal.
> >> > Instead 7 bit code block should start with
> >> > #.inv 2b11100 +  7
> >> > 1 1 1 1 1 1 0
> >> > because there are 6 symbols in the 6 bit code.  (algorithm given in 
> >> > section
> >> > 3.2.2).
> >> >
> >> > Any idea to fix deflatecodes so that it can give valid huffman codes?
> >> > Thanks.
> >> > On Sep 11, 2014 1:01 AM, "Raul Miller" <[email protected]> wrote:
> >> >
> >> >> I think the use of the term "consecutive" rather than "sequential" is
> >> >> telling.
> >> >>
> >> >> The described algorithm is: compute the huffman code lengths:
> >> >>    #@>F1 hcodes A1
> >> >> 1 3 7 7 6 6 6 6 6 6 6 2
> >> >>
> >> >> Then assign ascending huffman codes first in length order and then
> >> >> within codes of the same length.
> >> >>
> >> >> Taken literally, that might be something like this:
> >> >>
> >> >> H=: 4 :0
> >> >>   L=.#@> x hcodes y
> >> >>   U=.~.L
> >> >>   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> >> >> )
> >> >>
> >> >>    ":@>F1 H A1
> >> >> 0
> >> >> 1 1 0
> >> >> 1 1 1 0 0 1 0
> >> >> 1 1 1 0 0 1 1
> >> >> 1 1 1 0 0 0
> >> >> 1 1 1 0 0 1
> >> >> 1 1 1 0 1 0
> >> >> 1 1 1 0 1 1
> >> >> 1 1 1 1 0 0
> >> >> 1 1 1 1 0 1
> >> >> 1 1 1 1 1 0
> >> >> 1 0
> >> >>
> >> >> But is this correct? Is it actually safe to leave the results like
> >> >> this - with all codes of the same length being consecutive to each
> >> >> other?
> >> >>
> >> >>    F (hcodes -:&:(#@>) H) A
> >> >> 0
> >> >>
> >> >> No.
> >> >>
> >> >> So... "consecutive" must refer only to the values used and not their
> >> >> order within the result.
> >> >>
> >> >> Perhaps something like this:
> >> >>
> >> >> deflatecodes=:4 :0
> >> >>   L=.#@> x hcodes y
> >> >>   U=.~.L
> >> >>   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> >> >>   R/:;(</. i.@#)L
> >> >> )
> >> >>
> >> >>    F (hcodes -:&:(#@>) deflatecodes)  A
> >> >> 1
> >> >>
> >> >> There should be a better way of doing this, but this should at least
> >> >> get you started.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> --
> >> >> Raul
> >> >>
> >> >>
> >> >> On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> wrote:
> >> >> > For huffman coding used in zlib:
> >> >> > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
> >> >> >
> >> >> >  The Huffman codes used for each alphabet in the "deflate"
> >> >> >  format have two additional rules:
> >> >> >
> >> >> >   * All codes of a given bit length have lexicographically
> >> >> >   consecutive values, in the same order as the symbols
> >> >> >   they represent;
> >> >> >
> >> >> >   * Shorter codes lexicographically precede longer codes.
> >> >> > I tried jwiki hcodes in
> >> >> > I try Roger's essay
> >> >> > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
> >> >> >
> >> >> > hc=: 4 : 0
> >> >> > if. 1=#x do. y
> >> >> > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end.
> >> >> > )
> >> >> >
> >> >> > hcodes=: 4 : 0
> >> >> > assert. x -:&$ y           NB. weights and words have same shape
> >> >> > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
> >> >> > assert. 1 >: L.y           NB. words are boxed not more than once
> >> >> > w=. ,&.> y                 NB. standardized words
> >> >> > assert. w -: ~.w           NB. words are unique
> >> >> > t=. 0 {:: x hc w           NB. minimal weight binary tree
> >> >> > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
> >> >> > )
> >> >> >
> >> >> > but the coding produced is malformed for zlib. eg,
> >> >> > this is what I ran into trouble
> >> >> >
> >> >> > f1=: 1 256 17 1 1 9 1
> >> >> > f2=: 2 1 0 1 255 0 1536
> >> >> > F=: ,/(f1#f2)
> >> >> > A=: i.286
> >> >> >
> >> >> > F hcodes A
> >> >> >
> >> >> > Or a shorter example
> >> >> >
> >> >> > A1=: i.12
> >> >> > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
> >> >> >
> >> >> > F1 hcodes A1
> >> >> >
> >> >> > Any idea?
> >> >> >
> >> >> > --
> >> >> > regards,
> >> >> > ====================================================
> >> >> > GPG key 1024D/4434BAB3 2008-08-24
> >> >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> >> >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> >> >> > ----------------------------------------------------------------------
> >> >> > For information about J forums see http://www.jsoftware.com/forums.htm
> >> >> ----------------------------------------------------------------------
> >> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >> >>
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >
> > --
> > regards,
> > ====================================================
> > GPG key 1024D/4434BAB3 2008-08-24
> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] zlib huffman coding

Reply via email to