If we use hcodes to get bit lengths, and another mechanism (I think the last one I posted is adequate) to determine the bit values, there should be no reason to iterate further. The two extra constraints should never require that bit lengths be changed.
Thanks, -- Raul On Sat, Sep 13, 2014 at 12:43 PM, bill lam <[email protected]> wrote: > I try the putty fork, its seems fialed to uncompress it own > compressed data. Perhaps its compressed data is malformed. > > I also look at the trees.c of the the official zlib. I do not > understand the code, too complex for me. AFAICU it does not > generate huffman codes using any standard algorithm, but it > assigns bit lengths using some heuristics (trail and error). > I guess we can try using hcodes to get an initial guess of bit > lengths and then iterate to make it satisfy the 2 rules. > > Чт, 11 сен 2014, bill lam написал(а): > > the frequencies (guessing from bit lengths) should be something like 2 3 > 1 1 > > (2 3 1 1) hcodes 'ABCD' > > > > the hard part is the inverse problem: how to get the huffman code with > > prior knowing the bits for each symbol. Your pointer to the putty > > fork looks like helpful. The comment is in lines 861 to 914, the code > > itself in line 915 to 964. Do you know how to express it in J? > > Thanks. > > > > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]> wrote: > > > Here a few other links ... after reading through the RFC. Not sure if > > > they help, but just sharing from my own research into assisting on > > > this topic > > > > > > https://github.com/evegard/pngview/blob/master/huffman.c#L54 > > > > > > And a fork of the putty version with dynamic huffman coding: > > > > http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup > > > > > > Or just generally googling some of the code from the RFC: > > > > https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20 > > > > > > > > > Using the code from > > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck > > > trying to match a simple example to the binary tree in the RFC: > > > > > > From the RFC: > > > > > > /\ Symbol Code > > > 0 1 ------ ---- > > > / \ A 00 > > > /\ B B 1 > > > 0 1 C 011 > > > / \ D 010 > > > A /\ > > > 0 1 > > > / \ > > > D C > > > > > > > > > > > > (4#1) hcodes 'ABCD' > > > ┌───┬───┬───┬───┐ > > > │0 0│0 1│1 0│1 1│ > > > └───┴───┴───┴───┘ > > > > > > Per the RFC, ideally that should match this? '00';'1';'011';'010' > > > > > > > > > From there, it seems like a pretty straightforward exercise to > > > transliterate the C code from the RFC into J code to recode the > > > example to: > > > > > > > > > Symbol Code > > > ------ ---- > > > A 10 > > > B 0 > > > C 110 > > > D 111 > > > > > > > > > I would probably start with a looping construct like what's in the RFC > > > and then figure out a more J way to do it, but first I would need to > > > figure out how to create the binary tree in that initial format. > > > > > > On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]> wrote: > > >> Thanks Joe, > > >> putty only use zlib static huffman for encoding so that it does not > build > > >> any huffman dictionary table. > > >> > > >> The zlib static huffman code does not care about individual symbol's > > >> frequency, it just encode 0 to 286 into bits, see section 3.2.6. > > >> On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]> wrote: > > >> > > >>> You've already likely considered this, but if it were me I would > compare > > >>> results to a working implementation. The one from putty seems pretty > clean > > >>> and standalone: > > >>> > https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C > > >>> . I was able to compile it on windows no problem and I assume it'd > be fine > > >>> on linux as well. > > >>> > > >>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <[email protected]> > > >>> wrote: > > >>> > > >>> > I think the use of the term "consecutive" rather than "sequential" > is > > >>> > telling. > > >>> > > > >>> > The described algorithm is: compute the huffman code lengths: > > >>> > #@>F1 hcodes A1 > > >>> > 1 3 7 7 6 6 6 6 6 6 6 2 > > >>> > > > >>> > Then assign ascending huffman codes first in length order and then > > >>> > within codes of the same length. > > >>> > > > >>> > Taken literally, that might be something like this: > > >>> > > > >>> > H=: 4 :0 > > >>> > L=.#@> x hcodes y > > >>> > U=.~.L > > >>> > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L > > >>> > ) > > >>> > > > >>> > ":@>F1 H A1 > > >>> > 0 > > >>> > 1 1 0 > > >>> > 1 1 1 0 0 1 0 > > >>> > 1 1 1 0 0 1 1 > > >>> > 1 1 1 0 0 0 > > >>> > 1 1 1 0 0 1 > > >>> > 1 1 1 0 1 0 > > >>> > 1 1 1 0 1 1 > > >>> > 1 1 1 1 0 0 > > >>> > 1 1 1 1 0 1 > > >>> > 1 1 1 1 1 0 > > >>> > 1 0 > > >>> > > > >>> > But is this correct? Is it actually safe to leave the results like > > >>> > this - with all codes of the same length being consecutive to each > > >>> > other? > > >>> > > > >>> > F (hcodes -:&:(#@>) H) A > > >>> > 0 > > >>> > > > >>> > No. > > >>> > > > >>> > So... "consecutive" must refer only to the values used and not > their > > >>> > order within the result. > > >>> > > > >>> > Perhaps something like this: > > >>> > > > >>> > deflatecodes=:4 :0 > > >>> > L=.#@> x hcodes y > > >>> > U=.~.L > > >>> > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L > > >>> > R/:;(</. i.@#)L > > >>> > ) > > >>> > > > >>> > F (hcodes -:&:(#@>) deflatecodes) A > > >>> > 1 > > >>> > > > >>> > There should be a better way of doing this, but this should at > least > > >>> > get you started. > > >>> > > > >>> > Thanks, > > >>> > > > >>> > -- > > >>> > Raul > > >>> > > > >>> > > > >>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> > wrote: > > >>> > > For huffman coding used in zlib: > > >>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2. > > >>> > > > > >>> > > The Huffman codes used for each alphabet in the "deflate" > > >>> > > format have two additional rules: > > >>> > > > > >>> > > * All codes of a given bit length have lexicographically > > >>> > > consecutive values, in the same order as the symbols > > >>> > > they represent; > > >>> > > > > >>> > > * Shorter codes lexicographically precede longer codes. > > >>> > > I tried jwiki hcodes in > > >>> > > I try Roger's essay > > >>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding > > >>> > > > > >>> > > hc=: 4 : 0 > > >>> > > if. 1=#x do. y > > >>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end. > > >>> > > ) > > >>> > > > > >>> > > hcodes=: 4 : 0 > > >>> > > assert. x -:&$ y NB. weights and words have same shape > > >>> > > assert. (0<:x) *. 1=#$x NB. weights are non-negative > > >>> > > assert. 1 >: L.y NB. words are boxed not more than once > > >>> > > w=. ,&.> y NB. standardized words > > >>> > > assert. w -: ~.w NB. words are unique > > >>> > > t=. 0 {:: x hc w NB. minimal weight binary tree > > >>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t > > >>> > > ) > > >>> > > > > >>> > > but the coding produced is malformed for zlib. eg, > > >>> > > this is what I ran into trouble > > >>> > > > > >>> > > f1=: 1 256 17 1 1 9 1 > > >>> > > f2=: 2 1 0 1 255 0 1536 > > >>> > > F=: ,/(f1#f2) > > >>> > > A=: i.286 > > >>> > > > > >>> > > F hcodes A > > >>> > > > > >>> > > Or a shorter example > > >>> > > > > >>> > > A1=: i.12 > > >>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1 > > >>> > > > > >>> > > F1 hcodes A1 > > >>> > > > > >>> > > Any idea? > > >>> > > > > >>> > > -- > > >>> > > regards, > > >>> > > ==================================================== > > >>> > > GPG key 1024D/4434BAB3 2008-08-24 > > >>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > >>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > >>> > > > ---------------------------------------------------------------------- > > >>> > > For information about J forums see > http://www.jsoftware.com/forums.htm > > >>> > > ---------------------------------------------------------------------- > > >>> > For information about J forums see > http://www.jsoftware.com/forums.htm > > >>> > > > >>> > ---------------------------------------------------------------------- > > >>> For information about J forums see > http://www.jsoftware.com/forums.htm > > >>> > > >> ---------------------------------------------------------------------- > > >> For information about J forums see > http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
