Re: [Jprogramming] zlib huffman coding

Raul Miller Sat, 13 Sep 2014 11:53:33 -0700

If we use hcodes to get bit lengths, and another mechanism (I think the
last one I posted is adequate) to determine the bit values, there should be
no reason to iterate further. The two extra constraints should never
require that bit lengths be changed.


Thanks,

-- 
Raul

On Sat, Sep 13, 2014 at 12:43 PM, bill lam <[email protected]> wrote:

> I try the putty fork, its seems fialed to uncompress it own
> compressed data.  Perhaps its compressed data is malformed.
>
> I also look at the trees.c of the the official zlib. I do not
> understand the code, too complex for me.  AFAICU it does not
> generate huffman codes using any standard algorithm, but it
> assigns bit lengths using some heuristics (trail and error).
> I guess we can try using hcodes to get an initial guess of bit
> lengths and then iterate to make it satisfy the 2 rules.
>
> Чт, 11 сен 2014, bill lam написал(а):
> > the frequencies (guessing from bit lengths) should be something like 2 3
> 1 1
> >   (2 3 1 1) hcodes 'ABCD'
> >
> > the hard part is the inverse problem: how to get the huffman code with
> > prior knowing the bits for each symbol.  Your pointer to the putty
> > fork looks like helpful.  The comment is in lines 861 to 914, the code
> > itself in line 915 to 964. Do you know how to express it in J?
> > Thanks.
> >
> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]> wrote:
> > > Here a few other links ... after reading through the RFC. Not sure if
> > > they help, but just sharing from my own research into assisting on
> > > this topic
> > >
> > > https://github.com/evegard/pngview/blob/master/huffman.c#L54
> > >
> > > And a fork of the putty version with dynamic huffman coding:
> > >
> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
> > >
> > > Or just generally googling some of the code from the RFC:
> > >
> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
> > >
> > >
> > > Using the code from
> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck
> > > trying to match a simple example to the binary tree in the RFC:
> > >
> > > From the RFC:
> > >
> > >                   /\              Symbol    Code
> > >                          0  1             ------    ----
> > >                         /    \                A      00
> > >                        /\     B               B       1
> > >                       0  1                    C     011
> > >                      /    \                   D     010
> > >                     A     /\
> > >                          0  1
> > >                         /    \
> > >                        D      C
> > >
> > >
> > >
> > >    (4#1) hcodes 'ABCD'
> > > ┌───┬───┬───┬───┐
> > > │0 0│0 1│1 0│1 1│
> > > └───┴───┴───┴───┘
> > >
> > > Per the RFC, ideally that should match this? '00';'1';'011';'010'
> > >
> > >
> > > From there, it seems like a pretty straightforward exercise to
> > > transliterate the C code from the RFC into J code to recode the
> > > example to:
> > >
> > >
> > >             Symbol  Code
> > >             ------  ----
> > >             A       10
> > >             B       0
> > >             C       110
> > >             D       111
> > >
> > >
> > > I would probably start with a looping construct like what's in the RFC
> > > and then figure out a more J way to do it, but first I would need to
> > > figure out how to create the binary tree in that initial format.
> > >
> > > On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]> wrote:
> > >> Thanks Joe,
> > >> putty only use zlib static huffman for encoding so that it does not
> build
> > >> any huffman dictionary table.
> > >>
> > >> The zlib static huffman code does not care about individual symbol's
> > >> frequency, it just encode 0 to 286 into bits, see section 3.2.6.
> > >>  On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]> wrote:
> > >>
> > >>> You've already likely considered this, but if it were me I would
> compare
> > >>> results to a working implementation. The one from putty seems pretty
> clean
> > >>> and standalone:
> > >>>
> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
> > >>> . I was able to compile it on windows no problem and I assume it'd
> be fine
> > >>> on linux as well.
> > >>>
> > >>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <[email protected]>
> > >>> wrote:
> > >>>
> > >>> > I think the use of the term "consecutive" rather than "sequential"
> is
> > >>> > telling.
> > >>> >
> > >>> > The described algorithm is: compute the huffman code lengths:
> > >>> >    #@>F1 hcodes A1
> > >>> > 1 3 7 7 6 6 6 6 6 6 6 2
> > >>> >
> > >>> > Then assign ascending huffman codes first in length order and then
> > >>> > within codes of the same length.
> > >>> >
> > >>> > Taken literally, that might be something like this:
> > >>> >
> > >>> > H=: 4 :0
> > >>> >   L=.#@> x hcodes y
> > >>> >   U=.~.L
> > >>> >   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >>> > )
> > >>> >
> > >>> >    ":@>F1 H A1
> > >>> > 0
> > >>> > 1 1 0
> > >>> > 1 1 1 0 0 1 0
> > >>> > 1 1 1 0 0 1 1
> > >>> > 1 1 1 0 0 0
> > >>> > 1 1 1 0 0 1
> > >>> > 1 1 1 0 1 0
> > >>> > 1 1 1 0 1 1
> > >>> > 1 1 1 1 0 0
> > >>> > 1 1 1 1 0 1
> > >>> > 1 1 1 1 1 0
> > >>> > 1 0
> > >>> >
> > >>> > But is this correct? Is it actually safe to leave the results like
> > >>> > this - with all codes of the same length being consecutive to each
> > >>> > other?
> > >>> >
> > >>> >    F (hcodes -:&:(#@>) H) A
> > >>> > 0
> > >>> >
> > >>> > No.
> > >>> >
> > >>> > So... "consecutive" must refer only to the values used and not
> their
> > >>> > order within the result.
> > >>> >
> > >>> > Perhaps something like this:
> > >>> >
> > >>> > deflatecodes=:4 :0
> > >>> >   L=.#@> x hcodes y
> > >>> >   U=.~.L
> > >>> >   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >>> >   R/:;(</. i.@#)L
> > >>> > )
> > >>> >
> > >>> >    F (hcodes -:&:(#@>) deflatecodes)  A
> > >>> > 1
> > >>> >
> > >>> > There should be a better way of doing this, but this should at
> least
> > >>> > get you started.
> > >>> >
> > >>> > Thanks,
> > >>> >
> > >>> > --
> > >>> > Raul
> > >>> >
> > >>> >
> > >>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]>
> wrote:
> > >>> > > For huffman coding used in zlib:
> > >>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
> > >>> > >
> > >>> > >  The Huffman codes used for each alphabet in the "deflate"
> > >>> > >  format have two additional rules:
> > >>> > >
> > >>> > >   * All codes of a given bit length have lexicographically
> > >>> > >   consecutive values, in the same order as the symbols
> > >>> > >   they represent;
> > >>> > >
> > >>> > >   * Shorter codes lexicographically precede longer codes.
> > >>> > > I tried jwiki hcodes in
> > >>> > > I try Roger's essay
> > >>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
> > >>> > >
> > >>> > > hc=: 4 : 0
> > >>> > > if. 1=#x do. y
> > >>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end.
> > >>> > > )
> > >>> > >
> > >>> > > hcodes=: 4 : 0
> > >>> > > assert. x -:&$ y           NB. weights and words have same shape
> > >>> > > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
> > >>> > > assert. 1 >: L.y           NB. words are boxed not more than once
> > >>> > > w=. ,&.> y                 NB. standardized words
> > >>> > > assert. w -: ~.w           NB. words are unique
> > >>> > > t=. 0 {:: x hc w           NB. minimal weight binary tree
> > >>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
> > >>> > > )
> > >>> > >
> > >>> > > but the coding produced is malformed for zlib. eg,
> > >>> > > this is what I ran into trouble
> > >>> > >
> > >>> > > f1=: 1 256 17 1 1 9 1
> > >>> > > f2=: 2 1 0 1 255 0 1536
> > >>> > > F=: ,/(f1#f2)
> > >>> > > A=: i.286
> > >>> > >
> > >>> > > F hcodes A
> > >>> > >
> > >>> > > Or a shorter example
> > >>> > >
> > >>> > > A1=: i.12
> > >>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
> > >>> > >
> > >>> > > F1 hcodes A1
> > >>> > >
> > >>> > > Any idea?
> > >>> > >
> > >>> > > --
> > >>> > > regards,
> > >>> > > ====================================================
> > >>> > > GPG key 1024D/4434BAB3 2008-08-24
> > >>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >>> > >
> ----------------------------------------------------------------------
> > >>> > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >>> >
> ----------------------------------------------------------------------
> > >>> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >>> >
> > >>>
> ----------------------------------------------------------------------
> > >>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >>>
> > >> ----------------------------------------------------------------------
> > >> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] zlib huffman coding

Reply via email to