I thought the bit lengths were encoded in the file, not recreated by an algorithm in the decoder?
Thanks, -- Raul On Fri, Sep 12, 2014 at 1:27 AM, bill lam <[email protected]> wrote: > I think this matters a lot because output of our encoder will > also be decoded by other zlib compliant decoders such as zlib.so > itself. Or I had mis-read your question? > > Пт, 12 сен 2014, Raul Miller написал(а): >> If we define >> >> bitlen=:4 :0 >> b=. 0~:x >> b #inv #@>(b#x) hcodes b#y >> ) >> >> Does it matter that these numbers are sometimes different from those >> used in another implementation? >> >> Thanks, >> >> -- >> Raul >> >> >> On Fri, Sep 12, 2014 at 12:59 AM, bill lam <[email protected]> wrote: >> > Consider an example >> > >> > 1 0 0 2 1 hcodes_jzlib_ i.5 >> > +-----+-------+-------+-+---+ >> > |1 1 1|1 1 0 0|1 1 0 1|0|1 0| >> > +-----+-------+-------+-+---+ >> > #&> 1 0 0 2 1 hcodes_jzlib_ i.5 >> > 3 4 4 1 2 >> > (0~:1 0 0 2 1)* #&> 1 0 0 2 1 hcodes_jzlib_ i.5 >> > 3 0 0 1 2 >> > >> > classic huffman did not expect a 0 frequency and if there were >> > it assigns the longest bit length to them. I think this is >> > reasonable because it must have a non-zero bit length to encode >> > an entity. For zlib, it seems that there is a third rule: >> > For the bit length vector, bit lengths for zero frequency symbols >> > are zero. >> > >> > From discussion here, I think using hcodes directly >> > or indirectly for zlib is a wrong direction. >> > >> > Пт, 12 сен 2014, Raul Miller написал(а): >> >> That's what I thought at first, also. >> >> >> >> But, let's look at the example at >> >> http://www.jsoftware.com/pipermail/programming/2014-September/039299.html >> >> >> >> and the bit widths given at >> >> http://www.jsoftware.com/pipermail/programming/2014-September/039327.html >> >> >> >> Here's how it looks to me: >> >> >> >> bits -:#@>F hcodes A >> >> 0 >> >> >> >> Now.. is this a problem? >> >> >> >> I think it is. Consider: >> >> >> >> #0 -.~#@>F hcodes A >> >> 286 >> >> #0 -.~bits >> >> 260 >> >> >> >> Incidentally, I found a bug in my code, while trying to understand and >> >> express this concept. >> >> >> >> Fixed version here: >> >> >> >> bl_count=:3 :0 NB. y is result of freqs >> >> 0,}.<:#/.~(,~ [: i. 1 + >./)y >> >> ) >> >> >> >> start_vals=: +:@+/\.&.|.@}:@,~&0 >> >> >> >> find_codes=:3 :0 NB. y is result of freqs >> >> b=. bl_count y >> >> v=. start_vals b >> >> n=. /:~ ~.y-.0 >> >> o=. ;({./.~ /:~ (</. i.@#)) y-.0 >> >> c=. ;<"1&.>n (([#2:) #: ])&.> (*b)#v+&.>i.&.>b >> >> c /: o >> >> ) >> >> >> >> An alternate version of the result from find_codes would be given by: >> >> >> >> def_code=:3 :0 >> >> b=. bl_count y >> >> v=. start_vals b >> >> n=. /:~ ~.y-.0 >> >> o=. ;({./.~ /:~ (</. i.@#)) y-.0 >> >> c=. ;n,.&.>(*b)#v+&.>i.&.>b >> >> (,. i.@#)c /: o >> >> ) >> >> >> >> Thanks, >> >> >> >> -- >> >> Raul >> >> >> >> >> >> On Thu, Sep 11, 2014 at 8:33 PM, Joe Bogner <[email protected]> wrote: >> >> > The bit widths are calculated from the huffman tree >> >> > >> >> > See >> >> > >> >> > http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree >> >> > >> >> > http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html >> >> > >> >> > The timing is interesting considering we were talking about trees the >> >> > other >> >> > day: >> >> > http://jsoftware.2058.n7.nabble.com/Ragged-Array-Shapes-are-Trees-td63207.html >> >> > >> >> > I was thinking to myself then how I hadn't used trees more than a few >> >> > times >> >> > in 18 years of programming. >> >> > >> >> > I am not sure how to apply your code to the problem. I also am not >> >> > completely sure what problem we are solving. If it is creating a >> >> > standalone J deflate implementation or PNG compression it may be a tall >> >> > order. I would be curious why not just interface to a C library like >> >> > what >> >> > is done in the image3 addon: >> >> > http://www.jsoftware.com/jwiki/Addons/media/image3 >> >> > On Sep 11, 2014 6:27 PM, "Raul Miller" <[email protected]> wrote: >> >> > >> >> >> Here's the code I came up with, with Bill's help: >> >> >> >> >> >> bl_count=:3 :0 NB. y is result of freqs >> >> >> 0,}.<:#/.~(,~ [: i. 1 + >./)y >> >> >> ) >> >> >> >> >> >> start_vals=: +:@+/\.&.|.@}:@,~&0 >> >> >> >> >> >> find_codes=:3 :0 NB. y is result of freqs >> >> >> b=. bl_count y >> >> >> v=. start_vals b >> >> >> n=. /:~ ~.y-.0 >> >> >> o=. ;({./.~ /:~ (</. i.@#)) y >> >> >> c=. ;<"1&.>n (([#2:) #: ])&.> (*b)#v+&.>i.&.>b >> >> >> c /: o >> >> >> ) >> >> >> >> >> >> An alternate version of the result from find_codes would be given by: >> >> >> >> >> >> def_code=:3 :0 >> >> >> b=. bl_count y >> >> >> v=. start_vals b >> >> >> n=. /:~ ~.y-.0 >> >> >> o=. ;({./.~ /:~ (</. i.@#)) y >> >> >> c=. ;n,.&.>(*b)#v+&.>i.&.>b >> >> >> (,. i.@#)c /: o >> >> >> ) >> >> >> >> >> >> The argument to find_codes or def_code is the bit widths for each >> >> >> symbol. >> >> >> >> >> >> I have not been able to figure out, from rfc 1951, how the bit widths >> >> >> are calculated. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> -- >> >> >> Raul >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Sep 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> >> >> >> wrote: >> >> >> > bill, I'd be interested in a solution but I don't think I can >> >> >> > contribute any more on this. I played with >> >> >> > https://code.google.com/p/miniz/ and became even more convinced of >> >> >> > the >> >> >> > complexity. It seems as though the compressor can decide whether to >> >> >> > include the dictionary code table or not -- likely based on the size >> >> >> > of the table. >> >> >> > >> >> >> > >> >> >> > http://tools.ietf.org/html/rfc1950 >> >> >> > >> >> >> > A preset dictionary is specially useful to compress short input >> >> >> > sequences. The compressor can take advantage of the dictionary >> >> >> > context to encode the input in a more compact manner. >> >> >> > >> >> >> > >> >> >> > More links for anyone who is following and cares to go down the >> >> >> > rabbit >> >> >> hole too: >> >> >> > >> >> >> > http://en.wikipedia.org/wiki/Canonical_Huffman_code >> >> >> > >> >> >> > >> >> >> http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Thu, Sep 11, 2014 at 1:28 PM, bill lam <[email protected]> >> >> >> > wrote: >> >> >> >> This codes seemed invalid. >> >> >> >> >> >> >> >> 1 is a prefix of 11 which is a prefix of 111. Suppose there >> >> >> >> is a bit pattern of 1 1 , it is ambiguous to mean >> >> >> >> [68,'1'] [68,'1'] >> >> >> >> or [65,'11'] >> >> >> >> >> >> >> >> The huffman code in rfc is canonical meaning there is exactly one >> >> >> >> possible huffman codes for a given bit length vector. This is >> >> >> >> important because the huffman code table itself will not be >> >> >> >> stored inside the deflate stream. The decoder only gets the bit >> >> >> >> length vector, if encoder and decoder use different huffman code >> >> >> >> for the same bit length vectors, it will not work. >> >> >> >> >> >> >> >> Чт, 11 сен 2014, Joe Bogner написал(а): >> >> >> >>> Ignore the pako.js example output... It was just outputting the >> >> >> >>> binary >> >> >> >>> representation of A-Z, not the huffman code >> >> >> >>> >> >> >> >>> This is what I meant to send >> >> >> >>> >> >> >> >>> For ABCD: >> >> >> >>> >> >> >> >>> [65,'11'], >> >> >> >>> [66,'0'], >> >> >> >>> [67,'10'], >> >> >> >>> [68,'1'], >> >> >> >>> [256,'111'] >> >> >> >>> >> >> >> >>> It still doesn't seem to be sorting correctly lexographically, but >> >> >> >>> I'm >> >> >> >>> not really in my comfort zone of understanding: >> >> >> >>> >> >> >> >>> The RFC has this instead: >> >> >> >>> >> >> >> >>> Symbol Code >> >> >> >>> ------ ---- >> >> >> >>> A 10 >> >> >> >>> B 0 >> >> >> >>> C 110 >> >> >> >>> D 111 >> >> >> >>> >> >> >> >>> I don't really know if it has to match the RFC or if each >> >> >> >>> implementation is able to do its own thing as long since it >> >> >> >>> includes >> >> >> >>> the distance/reverse lookup table (whatever it's called). >> >> >> >>> >> >> >> >>> FYI >> >> >> >>> >> >> >> >>> >> >> >> >>> This is where I inserted my code: >> >> >> >>> >> >> >> >>> /* >> >> >> =========================================================================== >> >> >> >>> * Generate the codes for a given tree and bit counts (which need >> >> >> >>> not >> >> >> be >> >> >> >>> * optimal). >> >> >> >>> * IN assertion: the array bl_count contains the bit length >> >> >> >>> statistics >> >> >> for >> >> >> >>> * the given tree and the field len is set for all tree elements. >> >> >> >>> * OUT assertion: the field code is set for all tree elements of >> >> >> >>> non >> >> >> >>> * zero code length. >> >> >> >>> */ >> >> >> >>> function gen_codes(tree, max_code, bl_count) >> >> >> >>> // ct_data *tree; /* the tree to decorate */ >> >> >> >>> // int max_code; /* largest code with non zero >> >> >> frequency */ >> >> >> >>> // ushf *bl_count; /* number of codes at each bit >> >> >> >>> length >> >> >> */ >> >> >> >>> { >> >> >> >>> var next_code = new Array(MAX_BITS+1); /* next code value for >> >> >> >>> each >> >> >> >>> bit length */ >> >> >> >>> var code = 0; /* running code value */ >> >> >> >>> var bits; /* bit index */ >> >> >> >>> var n; /* code index */ >> >> >> >>> >> >> >> >>> /* The distribution counts are first used to generate the code >> >> >> >>> values >> >> >> >>> * without bit reversal. >> >> >> >>> */ >> >> >> >>> for (bits = 1; bits <= MAX_BITS; bits++) { >> >> >> >>> next_code[bits] = code = (code + bl_count[bits-1]) << 1; >> >> >> >>> } >> >> >> >>> /* Check that the bit counts in bl_count are consistent. The last >> >> >> code >> >> >> >>> * must be all ones. >> >> >> >>> */ >> >> >> >>> //Assert (code + bl_count[MAX_BITS]-1 == (1<<MAX_BITS)-1, >> >> >> >>> // "inconsistent bit counts"); >> >> >> >>> //Tracev((stderr,"\ngen_codes: max_code %d ", max_code)); >> >> >> >>> >> >> >> >>> for (n = 0; n <= max_code; n++) { >> >> >> >>> var len = tree[n*2 + 1]/*.Len*/; >> >> >> >>> if (len === 0) { continue; } >> >> >> >>> /* Now reverse the bits */ >> >> >> >>> tree[n*2]/*.Code*/ = bi_reverse(next_code[len]++, len); >> >> >> >>> >> >> >> >>> if (tree!=static_ltree) { >> >> >> >>> var v = tree[n*2]; >> >> >> >>> console.log('[' + n + ",'" + v.toString(2) + "'],"); >> >> >> >>> } >> >> >> >>> //Tracecv(tree != static_ltree, (stderr,"\nn %3d %c l %2d c %4x >> >> >> (%x) ", >> >> >> >>> // n, (isgraph(n) ? n : ' '), len, tree[n].Code, >> >> >> next_code[len]-1)); >> >> >> >>> } >> >> >> >>> >> >> >> >>> } >> >> >> >>> >> >> >> >>> On Thu, Sep 11, 2014 at 11:55 AM, Joe Bogner <[email protected]> >> >> >> wrote: >> >> >> >>> > I think the prefix coding looks OK, but the 2 rules does not: >> >> >> >>> > >> >> >> >>> > I modified the code[1] to allow passing in a string and >> >> >> >>> > outputting >> >> >> the codes >> >> >> >>> > >> >> >> >>> > C:\temp>deflate ABCDEFGHIJKLMONPQRSTUVWXYZ >> >> >> >>> > code 65 : 0 00000000000000000000000000000000 >> >> >> >>> > code 66 : 6 00000000000000000000000000000110 >> >> >> >>> > code 67 : 8 00000000000000000000000000001000 >> >> >> >>> > code 68 : 4 00000000000000000000000000000100 >> >> >> >>> > code 69 : 22 00000000000000000000000000010110 >> >> >> >>> > code 70 : 14 00000000000000000000000000001110 >> >> >> >>> > code 71 : 30 00000000000000000000000000011110 >> >> >> >>> > code 72 : 1 00000000000000000000000000000001 >> >> >> >>> > code 73 : 17 00000000000000000000000000010001 >> >> >> >>> > code 74 : 12 00000000000000000000000000001100 >> >> >> >>> > code 75 : 9 00000000000000000000000000001001 >> >> >> >>> > code 76 : 25 00000000000000000000000000011001 >> >> >> >>> > code 77 : 5 00000000000000000000000000000101 >> >> >> >>> > code 78 : 21 00000000000000000000000000010101 >> >> >> >>> > code 79 : 13 00000000000000000000000000001101 >> >> >> >>> > code 80 : 29 00000000000000000000000000011101 >> >> >> >>> > code 81 : 3 00000000000000000000000000000011 >> >> >> >>> > code 82 : 19 00000000000000000000000000010011 >> >> >> >>> > code 83 : 11 00000000000000000000000000001011 >> >> >> >>> > code 84 : 27 00000000000000000000000000011011 >> >> >> >>> > code 85 : 7 00000000000000000000000000000111 >> >> >> >>> > code 86 : 23 00000000000000000000000000010111 >> >> >> >>> > code 87 : 15 00000000000000000000000000001111 >> >> >> >>> > code 88 : 31 00000000000000000000000000011111 >> >> >> >>> > code 89 : 2 00000000000000000000000000000010 >> >> >> >>> > code 90 : 10 00000000000000000000000000001010 >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > I think it violates the consecutive rule... Each letter has the >> >> >> >>> > same >> >> >> >>> > frequency. ABCD have the same bit length. The order is off: >> >> >> >>> > >> >> >> >>> > If I sort it lexographically using javascript: >> >> >> >>> > >> >> >> >>> > JSON.stringify([['a','00000000000000000000000000000000'], >> >> >> >>> > ['b','00000000000000000000000000000110'], >> >> >> >>> > ['c','00000000000000000000000000001000'], >> >> >> >>> > ['d','00000000000000000000000000000100']].sort(function(x,y) { >> >> >> >>> > return >> >> >> >>> > x[1] - y[1] })) >> >> >> >>> > >> >> >> >>> > >> >> >> "[["a","00000000000000000000000000000000"],["d","00000000000000000000000000000100"],["b","00000000000000000000000000000110"],["c","00000000000000000000000000001000"]]" >> >> >> >>> > >> >> >> >>> > As you can see, the order comes out a,d,b,c >> >> >> >>> > >> >> >> >>> > I played around with a javascript implementation, pako[2]. It >> >> >> >>> > seems >> >> >> to >> >> >> >>> > work correctly: >> >> >> >>> > >> >> >> >>> > As you can see, it sorts lexographically >> >> >> >>> > >> >> >> >>> > JSON.stringify([[65,'1000001'], >> >> >> >>> > [66,'1000010'], >> >> >> >>> > [67,'1000011'], >> >> >> >>> > [68,'1000100'], >> >> >> >>> > [69,'1000101'], >> >> >> >>> > [70,'1000110'], >> >> >> >>> > [71,'1000111'], >> >> >> >>> > [72,'1001000'], >> >> >> >>> > [73,'1001001'], >> >> >> >>> > [74,'1001010'], >> >> >> >>> > [75,'1001011'], >> >> >> >>> > [76,'1001100'], >> >> >> >>> > [77,'1001101'], >> >> >> >>> > [78,'1001110'], >> >> >> >>> > [79,'1001111'], >> >> >> >>> > [80,'1010000'], >> >> >> >>> > [81,'1010001'], >> >> >> >>> > [82,'1010010'], >> >> >> >>> > [83,'1010011'], >> >> >> >>> > [84,'1010100'], >> >> >> >>> > [85,'1010101'], >> >> >> >>> > [86,'1010110'], >> >> >> >>> > [87,'1010111'], >> >> >> >>> > [88,'1011000'], >> >> >> >>> > [89,'1011001'], >> >> >> >>> > [90,'1011010']].sort(function(x,y) { return x[1] - y[1] })) >> >> >> >>> > >> >> >> >>> > >> >> >> "[[65,"1000001"],[66,"1000010"],[67,"1000011"],[68,"1000100"],[69,"1000101"],[70,"1000110"],[71,"1000111"],[72,"1001000"],[73,"1001001"],[74,"1001010"],[75,"1001011"],[76,"1001100"],[77,"1001101"],[78,"1001110"],[79,"1001111"],[80,"1010000"],[81,"1010001"],[82,"1010010"],[83,"1010011"],[84,"1010100"],[85,"1010101"],[86,"1010110"],[87,"1010111"],[88,"1011000"],[89,"1011001"],[90,"1011010"]]" >> >> >> >>> > >> >> >> >>> > All the values are sorted correctly. >> >> >> >>> > >> >> >> >>> > Here it is with the same ABCD example: >> >> >> >>> > >> >> >> >>> > var pako = require('pako'); >> >> >> >>> > var binaryString = pako.deflate('ABCD', { to: 'string' }); >> >> >> >>> > console.log(binaryString); >> >> >> >>> > var restored = pako.inflate(binaryString, { to: 'string' }); >> >> >> >>> > console.log(restored); >> >> >> >>> > >> >> >> >>> > It successfully deflates and inflates itself >> >> >> >>> > >> >> >> >>> > x?♣A☺☺ ? mcÿ7♣A♫☻?☺♂ >> >> >> >>> > ABCD >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > Hope this helps... >> >> >> >>> > >> >> >> >>> > [1] - >> >> >> https://gist.github.com/joebo/a3c2932f0e5a7a0c3f07#file-deflate-c-L2613 >> >> >> >>> > [2] - https://rawgit.com/nodeca/pako/master/dist/pako.js >> >> >> >>> > >> >> >> >>> > On Thu, Sep 11, 2014 at 11:33 AM, bill lam <[email protected]> >> >> >> wrote: >> >> >> >>> >> This is strange since every author must had decode its own >> >> >> >>> >> encoded >> >> >> >>> >> data as a smoke test. >> >> >> >>> >> >> >> >> >>> >> Did you test if huffman code or bit lengths it produced was >> >> >> >>> >> correct or not, ie it is a prefix coding and it satisfy the 2 >> >> >> >>> >> rules in rfc. >> >> >> >>> >> >> >> >> >>> >> Чт, 11 сен 2014, Joe Bogner написал(а): >> >> >> >>> >>> unfortunately the dynamic coding in the putty fork doesn't >> >> >> >>> >>> seem to >> >> >> work: >> >> >> >>> >>> >> >> >> >>> >>> deflate -c deflate.c > out >> >> >> >>> >>> deflate -d out >> >> >> >>> >>> >> >> >> >>> >>> decoding error: incorrect data checksum >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> it works fine with static tables >> >> >> >>> >>> >> >> >> >>> >>> C:\temp>echo ABCD > ABCD >> >> >> >>> >>> >> >> >> >>> >>> C:\temp>deflate -c ABCD > out >> >> >> >>> >>> >> >> >> >>> >>> C:\temp>deflate -d out >> >> >> >>> >>> ABCD >> >> >> >>> >>> >> >> >> >>> >>> I added some debugging code to determine that deflating >> >> >> >>> >>> deflate.c >> >> >> >>> >>> would be a dynamic table... Assuming it's broke, I probably >> >> >> wouldn't >> >> >> >>> >>> use it as a reference implementation after all >> >> >> >>> >>> >> >> >> >>> >>> On Thu, Sep 11, 2014 at 3:45 AM, bill lam <[email protected]> >> >> >> wrote: >> >> >> >>> >>> > the frequencies (guessing from bit lengths) should be >> >> >> >>> >>> > something >> >> >> like 2 3 1 1 >> >> >> >>> >>> > (2 3 1 1) hcodes 'ABCD' >> >> >> >>> >>> > >> >> >> >>> >>> > the hard part is the inverse problem: how to get the huffman >> >> >> code with >> >> >> >>> >>> > prior knowing the bits for each symbol. Your pointer to the >> >> >> putty >> >> >> >>> >>> > fork looks like helpful. The comment is in lines 861 to 914, >> >> >> the code >> >> >> >>> >>> > itself in line 915 to 964. Do you know how to express it in >> >> >> >>> >>> > J? >> >> >> >>> >>> > Thanks. >> >> >> >>> >>> > >> >> >> >>> >>> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner >> >> >> >>> >>> > <[email protected]> >> >> >> wrote: >> >> >> >>> >>> >> Here a few other links ... after reading through the RFC. >> >> >> >>> >>> >> Not >> >> >> sure if >> >> >> >>> >>> >> they help, but just sharing from my own research into >> >> >> >>> >>> >> assisting >> >> >> on >> >> >> >>> >>> >> this topic >> >> >> >>> >>> >> >> >> >> >>> >>> >> https://github.com/evegard/pngview/blob/master/huffman.c#L54 >> >> >> >>> >>> >> >> >> >> >>> >>> >> And a fork of the putty version with dynamic huffman coding: >> >> >> >>> >>> >> >> >> >> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup >> >> >> >>> >>> >> >> >> >> >>> >>> >> Or just generally googling some of the code from the RFC: >> >> >> >>> >>> >> >> >> >> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20 >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> Using the code from >> >> >> >>> >>> >> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I >> >> >> >>> >>> >> got >> >> >> stuck >> >> >> >>> >>> >> trying to match a simple example to the binary tree in the >> >> >> >>> >>> >> RFC: >> >> >> >>> >>> >> >> >> >> >>> >>> >> From the RFC: >> >> >> >>> >>> >> >> >> >> >>> >>> >> /\ Symbol Code >> >> >> >>> >>> >> 0 1 ------ ---- >> >> >> >>> >>> >> / \ A 00 >> >> >> >>> >>> >> /\ B B 1 >> >> >> >>> >>> >> 0 1 C 011 >> >> >> >>> >>> >> / \ D 010 >> >> >> >>> >>> >> A /\ >> >> >> >>> >>> >> 0 1 >> >> >> >>> >>> >> / \ >> >> >> >>> >>> >> D C >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> (4#1) hcodes 'ABCD' >> >> >> >>> >>> >> ┌───┬───┬───┬───┐ >> >> >> >>> >>> >> │0 0│0 1│1 0│1 1│ >> >> >> >>> >>> >> └───┴───┴───┴───┘ >> >> >> >>> >>> >> >> >> >> >>> >>> >> Per the RFC, ideally that should match this? >> >> >> '00';'1';'011';'010' >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> From there, it seems like a pretty straightforward exercise >> >> >> >>> >>> >> to >> >> >> >>> >>> >> transliterate the C code from the RFC into J code to recode >> >> >> >>> >>> >> the >> >> >> >>> >>> >> example to: >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> Symbol Code >> >> >> >>> >>> >> ------ ---- >> >> >> >>> >>> >> A 10 >> >> >> >>> >>> >> B 0 >> >> >> >>> >>> >> C 110 >> >> >> >>> >>> >> D 111 >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> I would probably start with a looping construct like what's >> >> >> >>> >>> >> in >> >> >> the RFC >> >> >> >>> >>> >> and then figure out a more J way to do it, but first I would >> >> >> need to >> >> >> >>> >>> >> figure out how to create the binary tree in that initial >> >> >> >>> >>> >> format. >> >> >> >>> >>> >> >> >> >> >>> >>> >> On Wed, Sep 10, 2014 at 7:41 PM, bill lam >> >> >> >>> >>> >> <[email protected]> >> >> >> wrote: >> >> >> >>> >>> >>> Thanks Joe, >> >> >> >>> >>> >>> putty only use zlib static huffman for encoding so that it >> >> >> does not build >> >> >> >>> >>> >>> any huffman dictionary table. >> >> >> >>> >>> >>> >> >> >> >>> >>> >>> The zlib static huffman code does not care about individual >> >> >> symbol's >> >> >> >>> >>> >>> frequency, it just encode 0 to 286 into bits, see section >> >> >> 3.2.6. >> >> >> >>> >>> >>> On Sep 11, 2014 1:26 AM, "Joe Bogner" >> >> >> >>> >>> >>> <[email protected]> >> >> >> wrote: >> >> >> >>> >>> >>> >> >> >> >>> >>> >>>> You've already likely considered this, but if it were me I >> >> >> would compare >> >> >> >>> >>> >>>> results to a working implementation. The one from putty >> >> >> >>> >>> >>>> seems >> >> >> pretty clean >> >> >> >>> >>> >>>> and standalone: >> >> >> >>> >>> >>>> >> >> >> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C >> >> >> >>> >>> >>>> . I was able to compile it on windows no problem and I >> >> >> >>> >>> >>>> assume >> >> >> it'd be fine >> >> >> >>> >>> >>>> on linux as well. >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller < >> >> >> [email protected]> >> >> >> >>> >>> >>>> wrote: >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> > I think the use of the term "consecutive" rather than >> >> >> "sequential" is >> >> >> >>> >>> >>>> > telling. >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > The described algorithm is: compute the huffman code >> >> >> lengths: >> >> >> >>> >>> >>>> > #@>F1 hcodes A1 >> >> >> >>> >>> >>>> > 1 3 7 7 6 6 6 6 6 6 6 2 >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > Then assign ascending huffman codes first in length >> >> >> >>> >>> >>>> > order >> >> >> and then >> >> >> >>> >>> >>>> > within codes of the same length. >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > Taken literally, that might be something like this: >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > H=: 4 :0 >> >> >> >>> >>> >>>> > L=.#@> x hcodes y >> >> >> >>> >>> >>>> > U=.~.L >> >> >> >>> >>> >>>> > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L >> >> >> >>> >>> >>>> > ) >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > ":@>F1 H A1 >> >> >> >>> >>> >>>> > 0 >> >> >> >>> >>> >>>> > 1 1 0 >> >> >> >>> >>> >>>> > 1 1 1 0 0 1 0 >> >> >> >>> >>> >>>> > 1 1 1 0 0 1 1 >> >> >> >>> >>> >>>> > 1 1 1 0 0 0 >> >> >> >>> >>> >>>> > 1 1 1 0 0 1 >> >> >> >>> >>> >>>> > 1 1 1 0 1 0 >> >> >> >>> >>> >>>> > 1 1 1 0 1 1 >> >> >> >>> >>> >>>> > 1 1 1 1 0 0 >> >> >> >>> >>> >>>> > 1 1 1 1 0 1 >> >> >> >>> >>> >>>> > 1 1 1 1 1 0 >> >> >> >>> >>> >>>> > 1 0 >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > But is this correct? Is it actually safe to leave the >> >> >> results like >> >> >> >>> >>> >>>> > this - with all codes of the same length being >> >> >> >>> >>> >>>> > consecutive >> >> >> to each >> >> >> >>> >>> >>>> > other? >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > F (hcodes -:&:(#@>) H) A >> >> >> >>> >>> >>>> > 0 >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > No. >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > So... "consecutive" must refer only to the values used >> >> >> >>> >>> >>>> > and >> >> >> not their >> >> >> >>> >>> >>>> > order within the result. >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > Perhaps something like this: >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > deflatecodes=:4 :0 >> >> >> >>> >>> >>>> > L=.#@> x hcodes y >> >> >> >>> >>> >>>> > U=.~.L >> >> >> >>> >>> >>>> > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 >> >> >> >>> >>> >>>> > #:@i.@#)/.~L >> >> >> >>> >>> >>>> > R/:;(</. i.@#)L >> >> >> >>> >>> >>>> > ) >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > F (hcodes -:&:(#@>) deflatecodes) A >> >> >> >>> >>> >>>> > 1 >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > There should be a better way of doing this, but this >> >> >> >>> >>> >>>> > should >> >> >> at least >> >> >> >>> >>> >>>> > get you started. >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > Thanks, >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > -- >> >> >> >>> >>> >>>> > Raul >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam < >> >> >> [email protected]> wrote: >> >> >> >>> >>> >>>> > > For huffman coding used in zlib: >> >> >> >>> >>> >>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2. >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > The Huffman codes used for each alphabet in the >> >> >> >>> >>> >>>> > > "deflate" >> >> >> >>> >>> >>>> > > format have two additional rules: >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > * All codes of a given bit length have >> >> >> >>> >>> >>>> > > lexicographically >> >> >> >>> >>> >>>> > > consecutive values, in the same order as the symbols >> >> >> >>> >>> >>>> > > they represent; >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > * Shorter codes lexicographically precede longer >> >> >> >>> >>> >>>> > > codes. >> >> >> >>> >>> >>>> > > I tried jwiki hcodes in >> >> >> >>> >>> >>>> > > I try Roger's essay >> >> >> >>> >>> >>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > hc=: 4 : 0 >> >> >> >>> >>> >>>> > > if. 1=#x do. y >> >> >> >>> >>> >>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. >> >> >> 2{./:x end. >> >> >> >>> >>> >>>> > > ) >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > hcodes=: 4 : 0 >> >> >> >>> >>> >>>> > > assert. x -:&$ y NB. weights and words have >> >> >> same shape >> >> >> >>> >>> >>>> > > assert. (0<:x) *. 1=#$x NB. weights are >> >> >> >>> >>> >>>> > > non-negative >> >> >> >>> >>> >>>> > > assert. 1 >: L.y NB. words are boxed not >> >> >> >>> >>> >>>> > > more >> >> >> than once >> >> >> >>> >>> >>>> > > w=. ,&.> y NB. standardized words >> >> >> >>> >>> >>>> > > assert. w -: ~.w NB. words are unique >> >> >> >>> >>> >>>> > > t=. 0 {:: x hc w NB. minimal weight binary >> >> >> >>> >>> >>>> > > tree >> >> >> >>> >>> >>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t >> >> >> >>> >>> >>>> > > ) >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > but the coding produced is malformed for zlib. eg, >> >> >> >>> >>> >>>> > > this is what I ran into trouble >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > f1=: 1 256 17 1 1 9 1 >> >> >> >>> >>> >>>> > > f2=: 2 1 0 1 255 0 1536 >> >> >> >>> >>> >>>> > > F=: ,/(f1#f2) >> >> >> >>> >>> >>>> > > A=: i.286 >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > F hcodes A >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > Or a shorter example >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > A1=: i.12 >> >> >> >>> >>> >>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1 >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > F1 hcodes A1 >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > Any idea? >> >> >> >>> >>> >>>> > > >> >> >> >>> >>> >>>> > > -- >> >> >> >>> >>> >>>> > > regards, >> >> >> >>> >>> >>>> > > ==================================================== >> >> >> >>> >>> >>>> > > GPG key 1024D/4434BAB3 2008-08-24 >> >> >> >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> >> >> >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --armor --export >> >> >> >>> >>> >>>> > > 4434BAB3 >> >> >> >>> >>> >>>> > > >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >>> >>>> > > For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> >>> >>>> > >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >>> >>>> > For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> >>> >>>> > >> >> >> >>> >>> >>>> >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >>> >>>> For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>> >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >>> >>> For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> >>> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >>> >> For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> >>> > >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >>> > For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> >>> >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >>> For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> >> >> >> >> >>> >> -- >> >> >> >>> >> regards, >> >> >> >>> >> ==================================================== >> >> >> >>> >> GPG key 1024D/4434BAB3 2008-08-24 >> >> >> >>> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> >> >> >>> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> >> >> >>> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >>> >> For information about J forums see >> >> >> http://www.jsoftware.com/forums.htm >> >> >> >>> ---------------------------------------------------------------------- >> >> >> >>> For information about J forums see >> >> >> >>> http://www.jsoftware.com/forums.htm >> >> >> >> >> >> >> >> -- >> >> >> >> regards, >> >> >> >> ==================================================== >> >> >> >> GPG key 1024D/4434BAB3 2008-08-24 >> >> >> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> >> >> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> For information about J forums see >> >> >> >> http://www.jsoftware.com/forums.htm >> >> >> > ---------------------------------------------------------------------- >> >> >> > For information about J forums see >> >> >> > http://www.jsoftware.com/forums.htm >> >> >> ---------------------------------------------------------------------- >> >> >> For information about J forums see http://www.jsoftware.com/forums.htm >> >> > ---------------------------------------------------------------------- >> >> > For information about J forums see http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> >> For information about J forums see http://www.jsoftware.com/forums.htm >> > >> > -- >> > regards, >> > ==================================================== >> > GPG key 1024D/4434BAB3 2008-08-24 >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
