Oops, yes, I should incorporate the zero test into bitlens.
bitlens=: 4 :0
t=. 0{:: x hc w=. ,&.> y
(*x)*((<S:0 t)i. w) { #S:1 {::t
)
I should also include
if. *./x=0 do. (#x){.1 return. end.
unless more depends on that special case test.
As for "data is well formed" I just mean that the asserts in hcodes would
not reject the data. For example, no negative values in x.
Thanks,
--
Raul
On Sun, Sep 14, 2014 at 6:28 AM, bill lam <[email protected]> wrote:
> Thanks Raul and Joe,
>
> The problem seemed now solved. It turns out that how to get bit length
> was a red herring. The main problem is huffman coding is in-applicable
> to zero bit of information, ie the universe has only 0 or 1
> symbol. eg, the result of bitlen 0 0 0 or bitlen 1 0 0 0 0 cannot
> be correct (GIGO). However such input actually happens in zlib
> compression and section 4 of rfc,
>
> ... If only one distance
> code is used, it is encoded using one bit, not zero bits; in
> this case there is a single code length of one, with one unused
> code. One distance code of zero bits means that there are no
> distance codes used at all ...
>
> Now arc/zlib addon enables dynamic huffman coding. Thank you
> both of you for patience and help.
>
> BTW I found the new bitlens has some problems, first it should
> zero out bit length of zero frequency on output.
>
> 1 0 1 bitlen i.3
> 1 0 1
>
> 1 0 1 bitlens i.3
> 2 2 1
>
> Actually 1 bit can encode 2 symbols, data encoded by it were
> rejected during decoding with zlib.so even after masked off
> the zero frequency entries.
>
> What did you mean by "data is well formed"?
>
>
> Вс, 14 сен 2014, Raul Miller написал(а):
> > Here's a perhaps slightly more efficient way of computing bit lengths:
> >
> > bitlens=: 4 :0
> > t=. 0{:: x hc w=. ,&.> y
> > ((<S:0 t)i. w) { #S:1 {::t
> > )
> >
> > This assumes that the data is well formed, but the big thing is that it
> > does not compute a boxed intermediate result and instead finds the
> lengths
> > earlier in the process.
> >
> > Thanks,
> >
> > --
> > Raul
> >
> > On Sat, Sep 13, 2014 at 3:41 PM, Joe Bogner <[email protected]> wrote:
> >
> > > Raul, I just tried your code and confirmed that it returned the same
> > > results as mine:
> > >
> > > find_codes (3, 3, 3, 3,3, 2, 4, 4)
> > > ┌─────┬─────┬─────┬─────┬─────┬───┬───────┬───────┐
> > > │0 1 0│0 1 1│1 0 0│1 0 1│1 1 0│0 0│1 1 1 0│1 1 1 1│
> > > └─────┴─────┴─────┴─────┴─────┴───┴───────┴───────┘
> > >
> > > #: inv every find_codes (3, 3, 3, 3,3, 2, 4, 4)
> > > 2 3 4 5 6 0 14 15
> > >
> > > tree=: makeFromLengths (3, 3, 3, 3,3, 2, 4, 4)
> > > tree
> > >
> > > 2 3 4 5 6 0 14 15
> > >
> > > #: tree
> > > 0 0 1 0
> > > 0 0 1 1
> > > 0 1 0 0
> > > 0 1 0 1
> > > 0 1 1 0
> > > 0 0 0 0
> > > 1 1 1 0
> > > 1 1 1 1
> > >
> > > I like the looks of yours more. Mine was a transliteration from the
> > > RFC. I will need to study yours as I don't understand it yet.
> > >
> > > I agree that using hcodes with either mine or yours should produce
> > > what's needed per the RFC, at least the steps we were looking at in
> > > the RFC. There is likely more solve still beyond that
> > >
> > >
> > > On Sat, Sep 13, 2014 at 2:52 PM, Raul Miller <[email protected]>
> > > wrote:
> > > > If we use hcodes to get bit lengths, and another mechanism (I think
> the
> > > > last one I posted is adequate) to determine the bit values, there
> should
> > > be
> > > > no reason to iterate further. The two extra constraints should never
> > > > require that bit lengths be changed.
> > > >
> > > > Thanks,
> > > >
> > > > --
> > > > Raul
> > > >
> > > > On Sat, Sep 13, 2014 at 12:43 PM, bill lam <[email protected]>
> wrote:
> > > >
> > > >> I try the putty fork, its seems fialed to uncompress it own
> > > >> compressed data. Perhaps its compressed data is malformed.
> > > >>
> > > >> I also look at the trees.c of the the official zlib. I do not
> > > >> understand the code, too complex for me. AFAICU it does not
> > > >> generate huffman codes using any standard algorithm, but it
> > > >> assigns bit lengths using some heuristics (trail and error).
> > > >> I guess we can try using hcodes to get an initial guess of bit
> > > >> lengths and then iterate to make it satisfy the 2 rules.
> > > >>
> > > >> Чт, 11 сен 2014, bill lam написал(а):
> > > >> > the frequencies (guessing from bit lengths) should be something
> like
> > > 2 3
> > > >> 1 1
> > > >> > (2 3 1 1) hcodes 'ABCD'
> > > >> >
> > > >> > the hard part is the inverse problem: how to get the huffman code
> with
> > > >> > prior knowing the bits for each symbol. Your pointer to the putty
> > > >> > fork looks like helpful. The comment is in lines 861 to 914, the
> code
> > > >> > itself in line 915 to 964. Do you know how to express it in J?
> > > >> > Thanks.
> > > >> >
> > > >> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]>
> > > wrote:
> > > >> > > Here a few other links ... after reading through the RFC. Not
> sure
> > > if
> > > >> > > they help, but just sharing from my own research into assisting
> on
> > > >> > > this topic
> > > >> > >
> > > >> > > https://github.com/evegard/pngview/blob/master/huffman.c#L54
> > > >> > >
> > > >> > > And a fork of the putty version with dynamic huffman coding:
> > > >> > >
> > > >>
> > >
> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
> > > >> > >
> > > >> > > Or just generally googling some of the code from the RFC:
> > > >> > >
> > > >>
> > >
> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
> > > >> > >
> > > >> > >
> > > >> > > Using the code from
> > > >> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got
> stuck
> > > >> > > trying to match a simple example to the binary tree in the RFC:
> > > >> > >
> > > >> > > From the RFC:
> > > >> > >
> > > >> > > /\ Symbol Code
> > > >> > > 0 1 ------ ----
> > > >> > > / \ A 00
> > > >> > > /\ B B 1
> > > >> > > 0 1 C 011
> > > >> > > / \ D 010
> > > >> > > A /\
> > > >> > > 0 1
> > > >> > > / \
> > > >> > > D C
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > (4#1) hcodes 'ABCD'
> > > >> > > ┌───┬───┬───┬───┐
> > > >> > > │0 0│0 1│1 0│1 1│
> > > >> > > └───┴───┴───┴───┘
> > > >> > >
> > > >> > > Per the RFC, ideally that should match this?
> '00';'1';'011';'010'
> > > >> > >
> > > >> > >
> > > >> > > From there, it seems like a pretty straightforward exercise to
> > > >> > > transliterate the C code from the RFC into J code to recode the
> > > >> > > example to:
> > > >> > >
> > > >> > >
> > > >> > > Symbol Code
> > > >> > > ------ ----
> > > >> > > A 10
> > > >> > > B 0
> > > >> > > C 110
> > > >> > > D 111
> > > >> > >
> > > >> > >
> > > >> > > I would probably start with a looping construct like what's in
> the
> > > RFC
> > > >> > > and then figure out a more J way to do it, but first I would
> need to
> > > >> > > figure out how to create the binary tree in that initial format.
> > > >> > >
> > > >> > > On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]>
> > > wrote:
> > > >> > >> Thanks Joe,
> > > >> > >> putty only use zlib static huffman for encoding so that it
> does not
> > > >> build
> > > >> > >> any huffman dictionary table.
> > > >> > >>
> > > >> > >> The zlib static huffman code does not care about individual
> > > symbol's
> > > >> > >> frequency, it just encode 0 to 286 into bits, see section
> 3.2.6.
> > > >> > >> On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]>
> > > wrote:
> > > >> > >>
> > > >> > >>> You've already likely considered this, but if it were me I
> would
> > > >> compare
> > > >> > >>> results to a working implementation. The one from putty seems
> > > pretty
> > > >> clean
> > > >> > >>> and standalone:
> > > >> > >>>
> > > >>
> > >
> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
> > > >> > >>> . I was able to compile it on windows no problem and I assume
> it'd
> > > >> be fine
> > > >> > >>> on linux as well.
> > > >> > >>>
> > > >> > >>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <
> > > [email protected]>
> > > >> > >>> wrote:
> > > >> > >>>
> > > >> > >>> > I think the use of the term "consecutive" rather than
> > > "sequential"
> > > >> is
> > > >> > >>> > telling.
> > > >> > >>> >
> > > >> > >>> > The described algorithm is: compute the huffman code
> lengths:
> > > >> > >>> > #@>F1 hcodes A1
> > > >> > >>> > 1 3 7 7 6 6 6 6 6 6 6 2
> > > >> > >>> >
> > > >> > >>> > Then assign ascending huffman codes first in length order
> and
> > > then
> > > >> > >>> > within codes of the same length.
> > > >> > >>> >
> > > >> > >>> > Taken literally, that might be something like this:
> > > >> > >>> >
> > > >> > >>> > H=: 4 :0
> > > >> > >>> > L=.#@> x hcodes y
> > > >> > >>> > U=.~.L
> > > >> > >>> > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > > >> > >>> > )
> > > >> > >>> >
> > > >> > >>> > ":@>F1 H A1
> > > >> > >>> > 0
> > > >> > >>> > 1 1 0
> > > >> > >>> > 1 1 1 0 0 1 0
> > > >> > >>> > 1 1 1 0 0 1 1
> > > >> > >>> > 1 1 1 0 0 0
> > > >> > >>> > 1 1 1 0 0 1
> > > >> > >>> > 1 1 1 0 1 0
> > > >> > >>> > 1 1 1 0 1 1
> > > >> > >>> > 1 1 1 1 0 0
> > > >> > >>> > 1 1 1 1 0 1
> > > >> > >>> > 1 1 1 1 1 0
> > > >> > >>> > 1 0
> > > >> > >>> >
> > > >> > >>> > But is this correct? Is it actually safe to leave the
> results
> > > like
> > > >> > >>> > this - with all codes of the same length being consecutive
> to
> > > each
> > > >> > >>> > other?
> > > >> > >>> >
> > > >> > >>> > F (hcodes -:&:(#@>) H) A
> > > >> > >>> > 0
> > > >> > >>> >
> > > >> > >>> > No.
> > > >> > >>> >
> > > >> > >>> > So... "consecutive" must refer only to the values used and
> not
> > > >> their
> > > >> > >>> > order within the result.
> > > >> > >>> >
> > > >> > >>> > Perhaps something like this:
> > > >> > >>> >
> > > >> > >>> > deflatecodes=:4 :0
> > > >> > >>> > L=.#@> x hcodes y
> > > >> > >>> > U=.~.L
> > > >> > >>> > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > > >> > >>> > R/:;(</. i.@#)L
> > > >> > >>> > )
> > > >> > >>> >
> > > >> > >>> > F (hcodes -:&:(#@>) deflatecodes) A
> > > >> > >>> > 1
> > > >> > >>> >
> > > >> > >>> > There should be a better way of doing this, but this should
> at
> > > >> least
> > > >> > >>> > get you started.
> > > >> > >>> >
> > > >> > >>> > Thanks,
> > > >> > >>> >
> > > >> > >>> > --
> > > >> > >>> > Raul
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <
> [email protected]
> > > >
> > > >> wrote:
> > > >> > >>> > > For huffman coding used in zlib:
> > > >> > >>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
> > > >> > >>> > >
> > > >> > >>> > > The Huffman codes used for each alphabet in the "deflate"
> > > >> > >>> > > format have two additional rules:
> > > >> > >>> > >
> > > >> > >>> > > * All codes of a given bit length have lexicographically
> > > >> > >>> > > consecutive values, in the same order as the symbols
> > > >> > >>> > > they represent;
> > > >> > >>> > >
> > > >> > >>> > > * Shorter codes lexicographically precede longer codes.
> > > >> > >>> > > I tried jwiki hcodes in
> > > >> > >>> > > I try Roger's essay
> > > >> > >>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
> > > >> > >>> > >
> > > >> > >>> > > hc=: 4 : 0
> > > >> > >>> > > if. 1=#x do. y
> > > >> > >>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=.
> 2{./:x
> > > end.
> > > >> > >>> > > )
> > > >> > >>> > >
> > > >> > >>> > > hcodes=: 4 : 0
> > > >> > >>> > > assert. x -:&$ y NB. weights and words have same
> > > shape
> > > >> > >>> > > assert. (0<:x) *. 1=#$x NB. weights are non-negative
> > > >> > >>> > > assert. 1 >: L.y NB. words are boxed not more
> than
> > > once
> > > >> > >>> > > w=. ,&.> y NB. standardized words
> > > >> > >>> > > assert. w -: ~.w NB. words are unique
> > > >> > >>> > > t=. 0 {:: x hc w NB. minimal weight binary tree
> > > >> > >>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
> > > >> > >>> > > )
> > > >> > >>> > >
> > > >> > >>> > > but the coding produced is malformed for zlib. eg,
> > > >> > >>> > > this is what I ran into trouble
> > > >> > >>> > >
> > > >> > >>> > > f1=: 1 256 17 1 1 9 1
> > > >> > >>> > > f2=: 2 1 0 1 255 0 1536
> > > >> > >>> > > F=: ,/(f1#f2)
> > > >> > >>> > > A=: i.286
> > > >> > >>> > >
> > > >> > >>> > > F hcodes A
> > > >> > >>> > >
> > > >> > >>> > > Or a shorter example
> > > >> > >>> > >
> > > >> > >>> > > A1=: i.12
> > > >> > >>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
> > > >> > >>> > >
> > > >> > >>> > > F1 hcodes A1
> > > >> > >>> > >
> > > >> > >>> > > Any idea?
> > > >> > >>> > >
> > > >> > >>> > > --
> > > >> > >>> > > regards,
> > > >> > >>> > > ====================================================
> > > >> > >>> > > GPG key 1024D/4434BAB3 2008-08-24
> > > >> > >>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > >> > >>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > >> > >>> > >
> > > >>
> ----------------------------------------------------------------------
> > > >> > >>> > > For information about J forums see
> > > >> http://www.jsoftware.com/forums.htm
> > > >> > >>> >
> > > >>
> ----------------------------------------------------------------------
> > > >> > >>> > For information about J forums see
> > > >> http://www.jsoftware.com/forums.htm
> > > >> > >>> >
> > > >> > >>>
> > > >>
> ----------------------------------------------------------------------
> > > >> > >>> For information about J forums see
> > > >> http://www.jsoftware.com/forums.htm
> > > >> > >>>
> > > >> > >>
> > > ----------------------------------------------------------------------
> > > >> > >> For information about J forums see
> > > >> http://www.jsoftware.com/forums.htm
> > > >> > >
> > > ----------------------------------------------------------------------
> > > >> > > For information about J forums see
> > > http://www.jsoftware.com/forums.htm
> > > >>
> > > >> --
> > > >> regards,
> > > >> ====================================================
> > > >> GPG key 1024D/4434BAB3 2008-08-24
> > > >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > >>
> ----------------------------------------------------------------------
> > > >> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > >>
> > > >
> ----------------------------------------------------------------------
> > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm