Ah, good point. Here's another thing:
+/ bits < #@> F hcodes A 31 In some cases the deflate codes have fewer bits than huffman coding. So the problem is finding the algorithm that was used to define bits - it's not the huffman coding algorithm. I guess I need to spend some time reading rfc 1951. Thanks, -- Raul On Wed, Sep 10, 2014 at 11:41 PM, bill lam <[email protected]> wrote: > It refered to A and F (not A1 and F1) in my first email. > > I sent you an email off-list that included attachments. > > > Ср, 10 сен 2014, Raul Miller написал(а): > > Something is wrong here: > > > > $bits > > 286 > > (0=bits)-:0=F1 > > 0 > > +/0=F1 > > 9 > > +/0=bits > > 26 > > > > 0 bits should mean that that symbol is not represented. > > > > But there are only 9 such symbols in F1. > > > > -- > > Raul > > > > On Wed, Sep 10, 2014 at 11:09 PM, bill lam <[email protected]> wrote: > > > I guess the description of the algorithm itself is correct, but > > > it does not mention how to get the bits of each symbol. The bit > > > lengths for classic and zlib huffman can be different. > > > > > > for testing, the bits for A from libz.so are > > > > > > ":&>_10<\bits > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 11 11 10 10 10 10 11 > > > 11 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 10 10 10 > > > 10 10 10 10 10 10 10 0 0 0 > > > 0 0 0 0 0 0 0 0 0 0 > > > 0 0 0 0 10 2 0 0 0 0 > > > 0 0 0 0 0 1 > > > > > > this is the huffman dictionary stored in zlib stream and the actual > > > huffman codes are computed using the algorithm mentioned during > decoding. > > > > > > Ср, 10 сен 2014, Raul Miller написал(а): > > >> I agree that there should not be any 2s. > > >> > > >> However, I think this means that there's something wrong with the > > >> description of the algorithm, or at least my understanding of it. > > >> > > >> I'll need some time to digest this. > > >> > > >> Thanks, > > >> > > >> -- > > >> Raul > > >> > > >> On Wed, Sep 10, 2014 at 9:39 PM, bill lam <[email protected]> > wrote: > > >> > pardon me of forgetting telling another cavaet. > > >> > zlib huffman code is suboptimal so that bit length of code > > >> > for each symbol can be longer than that in classic huffman. > > >> > eg in my orignal data. > > >> > > > >> > 10{. ":@>F deflatecodes2 A > > >> > 1 1 0 0 0 0 0 0 0 0 > > >> > 1 1 1 1 1 1 1 1 1 1 0 > > >> > 1 1 1 1 1 1 1 1 1 1 1 > > >> > 1 1 1 1 1 1 1 1 1 2 0 > > >> > 1 1 1 1 1 1 1 1 1 2 1 > > >> > 1 1 1 1 1 1 1 1 2 1 0 > > >> > 1 1 0 0 0 0 0 0 0 1 > > >> > 1 1 0 0 0 0 0 0 1 0 > > >> > 1 1 0 0 0 0 0 0 1 1 > > >> > 1 1 0 0 0 0 0 1 0 0 > > >> > > > >> > there shouldn't be any 2's. It needs to increase bit length of > > >> > some code. > > >> > > > >> > Ср, 10 сен 2014, Raul Miller написал(а): > > >> >> Oops, sorry about that. > > >> >> > > >> >> Try it this way: > > >> >> > > >> >> deflatecodes2=:4 :0 > > >> >> L=. #@> x hcodes y > > >> >> U=. 0,~.L > > >> >> R=. ;<@(({. >./@(>#])&U #1:)@{. <@:+"1-@{.{."1 #:@i.@#)/.~L > > >> >> R/:;(</. i.@#)L > > >> >> ) > > >> >> > > >> >> ":@>F1 deflatecodes2 A1 > > >> >> 0 > > >> >> 1 1 0 > > >> >> 1 1 1 1 1 1 0 > > >> >> 1 1 1 1 1 1 1 > > >> >> 1 1 1 0 0 0 > > >> >> 1 1 1 0 0 1 > > >> >> 1 1 1 0 1 0 > > >> >> 1 1 1 0 1 1 > > >> >> 1 1 1 1 0 0 > > >> >> 1 1 1 1 0 1 > > >> >> 1 1 1 1 1 0 > > >> >> 1 0 > > >> >> > > >> >> Thanks, > > >> >> > > >> >> -- > > >> >> Raul > > >> >> > > >> >> > > >> >> On Wed, Sep 10, 2014 at 7:26 PM, bill lam <[email protected]> > wrote: > > >> >> > for zlib, all huffman code of the same bit length are in > sequential, > > >> >> > starting with a code an 0 bit in the end (even number), and no > gaps in the > > >> >> > block (consecutive). the symbols of these block represented are > sequential > > >> >> > but not consecutive (possibly gaps). So the zlib huffman code is > slightly > > >> >> > less efficient than the original huffman code but the advantage > is simpler > > >> >> > to store the table, just the bits used. section 3.2.6 gives an > example of > > >> >> > such table. > > >> >> > > > >> >> > deflatecodes can satisfy the 2 rules _but_ its result is invalid > because > > >> >> > huffman code is a prefix coding. > > >> >> > > > >> >> > ,.F1 ( deflatecodes) A1 > > >> >> > ┌─────────────┐ > > >> >> > │0 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 0 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 0 0 1 0│ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 0 0 1 1│ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 0 0 0 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 0 0 1 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 0 1 0 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 0 1 1 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 1 0 0 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 1 0 1 │ > > >> >> > ├─────────────┤ > > >> >> > │1 1 1 1 1 0 │ > > >> >> > ├─────────────┤ > > >> >> > │1 0 │ > > >> >> > └─────────────┘ > > >> >> > 7 bit 1110010 is a prefix of 6 bit 111001 and this is illegal. > > >> >> > Instead 7 bit code block should start with > > >> >> > #.inv 2b11100 + 7 > > >> >> > 1 1 1 1 1 1 0 > > >> >> > because there are 6 symbols in the 6 bit code. (algorithm given > in section > > >> >> > 3.2.2). > > >> >> > > > >> >> > Any idea to fix deflatecodes so that it can give valid huffman > codes? > > >> >> > Thanks. > > >> >> > On Sep 11, 2014 1:01 AM, "Raul Miller" <[email protected]> > wrote: > > >> >> > > > >> >> >> I think the use of the term "consecutive" rather than > "sequential" is > > >> >> >> telling. > > >> >> >> > > >> >> >> The described algorithm is: compute the huffman code lengths: > > >> >> >> #@>F1 hcodes A1 > > >> >> >> 1 3 7 7 6 6 6 6 6 6 6 2 > > >> >> >> > > >> >> >> Then assign ascending huffman codes first in length order and > then > > >> >> >> within codes of the same length. > > >> >> >> > > >> >> >> Taken literally, that might be something like this: > > >> >> >> > > >> >> >> H=: 4 :0 > > >> >> >> L=.#@> x hcodes y > > >> >> >> U=.~.L > > >> >> >> ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L > > >> >> >> ) > > >> >> >> > > >> >> >> ":@>F1 H A1 > > >> >> >> 0 > > >> >> >> 1 1 0 > > >> >> >> 1 1 1 0 0 1 0 > > >> >> >> 1 1 1 0 0 1 1 > > >> >> >> 1 1 1 0 0 0 > > >> >> >> 1 1 1 0 0 1 > > >> >> >> 1 1 1 0 1 0 > > >> >> >> 1 1 1 0 1 1 > > >> >> >> 1 1 1 1 0 0 > > >> >> >> 1 1 1 1 0 1 > > >> >> >> 1 1 1 1 1 0 > > >> >> >> 1 0 > > >> >> >> > > >> >> >> But is this correct? Is it actually safe to leave the results > like > > >> >> >> this - with all codes of the same length being consecutive to > each > > >> >> >> other? > > >> >> >> > > >> >> >> F (hcodes -:&:(#@>) H) A > > >> >> >> 0 > > >> >> >> > > >> >> >> No. > > >> >> >> > > >> >> >> So... "consecutive" must refer only to the values used and not > their > > >> >> >> order within the result. > > >> >> >> > > >> >> >> Perhaps something like this: > > >> >> >> > > >> >> >> deflatecodes=:4 :0 > > >> >> >> L=.#@> x hcodes y > > >> >> >> U=.~.L > > >> >> >> R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L > > >> >> >> R/:;(</. i.@#)L > > >> >> >> ) > > >> >> >> > > >> >> >> F (hcodes -:&:(#@>) deflatecodes) A > > >> >> >> 1 > > >> >> >> > > >> >> >> There should be a better way of doing this, but this should at > least > > >> >> >> get you started. > > >> >> >> > > >> >> >> Thanks, > > >> >> >> > > >> >> >> -- > > >> >> >> Raul > > >> >> >> > > >> >> >> > > >> >> >> On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> > wrote: > > >> >> >> > For huffman coding used in zlib: > > >> >> >> > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2. > > >> >> >> > > > >> >> >> > The Huffman codes used for each alphabet in the "deflate" > > >> >> >> > format have two additional rules: > > >> >> >> > > > >> >> >> > * All codes of a given bit length have lexicographically > > >> >> >> > consecutive values, in the same order as the symbols > > >> >> >> > they represent; > > >> >> >> > > > >> >> >> > * Shorter codes lexicographically precede longer codes. > > >> >> >> > I tried jwiki hcodes in > > >> >> >> > I try Roger's essay > > >> >> >> > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding > > >> >> >> > > > >> >> >> > hc=: 4 : 0 > > >> >> >> > if. 1=#x do. y > > >> >> >> > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x > end. > > >> >> >> > ) > > >> >> >> > > > >> >> >> > hcodes=: 4 : 0 > > >> >> >> > assert. x -:&$ y NB. weights and words have same > shape > > >> >> >> > assert. (0<:x) *. 1=#$x NB. weights are non-negative > > >> >> >> > assert. 1 >: L.y NB. words are boxed not more than > once > > >> >> >> > w=. ,&.> y NB. standardized words > > >> >> >> > assert. w -: ~.w NB. words are unique > > >> >> >> > t=. 0 {:: x hc w NB. minimal weight binary tree > > >> >> >> > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t > > >> >> >> > ) > > >> >> >> > > > >> >> >> > but the coding produced is malformed for zlib. eg, > > >> >> >> > this is what I ran into trouble > > >> >> >> > > > >> >> >> > f1=: 1 256 17 1 1 9 1 > > >> >> >> > f2=: 2 1 0 1 255 0 1536 > > >> >> >> > F=: ,/(f1#f2) > > >> >> >> > A=: i.286 > > >> >> >> > > > >> >> >> > F hcodes A > > >> >> >> > > > >> >> >> > Or a shorter example > > >> >> >> > > > >> >> >> > A1=: i.12 > > >> >> >> > F1=: 2 1 0 0 0 0 0 0 0 0 0 1 > > >> >> >> > > > >> >> >> > F1 hcodes A1 > > >> >> >> > > > >> >> >> > Any idea? > > >> >> >> > > > >> >> >> > -- > > >> >> >> > regards, > > >> >> >> > ==================================================== > > >> >> >> > GPG key 1024D/4434BAB3 2008-08-24 > > >> >> >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > >> >> >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > >> >> >> > > ---------------------------------------------------------------------- > > >> >> >> > For information about J forums see > http://www.jsoftware.com/forums.htm > > >> >> >> > ---------------------------------------------------------------------- > > >> >> >> For information about J forums see > http://www.jsoftware.com/forums.htm > > >> >> >> > > >> >> > > ---------------------------------------------------------------------- > > >> >> > For information about J forums see > http://www.jsoftware.com/forums.htm > > >> >> > ---------------------------------------------------------------------- > > >> >> For information about J forums see > http://www.jsoftware.com/forums.htm > > >> > > > >> > -- > > >> > regards, > > >> > ==================================================== > > >> > GPG key 1024D/4434BAB3 2008-08-24 > > >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > >> > > ---------------------------------------------------------------------- > > >> > For information about J forums see > http://www.jsoftware.com/forums.htm > > >> ---------------------------------------------------------------------- > > >> For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > > -- > > > regards, > > > ==================================================== > > > GPG key 1024D/4434BAB3 2008-08-24 > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
