One more link: http://www.opensource.apple.com/source/zlib/zlib-43/zlib/contrib/puff/puff.c
* puff.c is a simple inflate written to be an unambiguous way to specify the * deflate format. It is not written for speed but rather simplicity. Not sure if it's any use On Thu, Sep 11, 2014 at 9:46 PM, Joe Bogner <[email protected]> wrote: > Thanks for the background. I don't know if I'm helping any but figured > I'd share one more thing (for now). I found a library, > https://github.com/lvandeve/lodepng, that has a permissive license and > successfully decoded and encoded a png file. > > As simple as: > > lodepng_decode32_file(&image, &width, &height, filename); > > lodepng_encode32_file("c:\\temp\\viewmatanalysis2.png", image, width, height); > > the new image opened fine in my image viewer. > > from: https://github.com/lvandeve/lodepng/blob/master/example_decode.c > > I stepped through the code in visual c and can see it doing the > dynamic huffman encoding. > > It might be a decent reference implementation to convert to J or at > least provide a test case against. > > On Thu, Sep 11, 2014 at 9:07 PM, bill lam <[email protected]> wrote: >> It seems everyone use huffman code in its own way, at least huffman coding >> in jpeg is different from that in deflate. >> >> The intent to is to replace bmp with png as the default image format in jal >> addons for j803. This is already almost fully done. With dynamic huffman >> coding, png images can achieve better compression ratio. But even without >> it, png files using fixed huffman coding can already compress about 90% to >> 99% for images output from viewmat. >> >> The addon will use stock zlib binaries when they are available, otherwise >> falls back to J implementation. >> >> This is independent of jqt so that it can work for jconsole and jhs. >> >> I don't know the details but I think jhs can also take advantage to support >> gzip/deflate encoding for transfer. >> On Sep 12, 2014 8:33 AM, "Joe Bogner" <[email protected]> wrote: >> >>> The bit widths are calculated from the huffman tree >>> >>> See >>> >>> >>> http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree >>> >>> >>> http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html >>> >>> The timing is interesting considering we were talking about trees the other >>> day: >>> >>> http://jsoftware.2058.n7.nabble.com/Ragged-Array-Shapes-are-Trees-td63207.html >>> >>> I was thinking to myself then how I hadn't used trees more than a few times >>> in 18 years of programming. >>> >>> I am not sure how to apply your code to the problem. I also am not >>> completely sure what problem we are solving. If it is creating a >>> standalone J deflate implementation or PNG compression it may be a tall >>> order. I would be curious why not just interface to a C library like what >>> is done in the image3 addon: >>> http://www.jsoftware.com/jwiki/Addons/media/image3 >>> On Sep 11, 2014 6:27 PM, "Raul Miller" <[email protected]> wrote: >>> >>> > Here's the code I came up with, with Bill's help: >>> > >>> > bl_count=:3 :0 NB. y is result of freqs >>> > 0,}.<:#/.~(,~ [: i. 1 + >./)y >>> > ) >>> > >>> > start_vals=: +:@+/\.&.|.@}:@,~&0 >>> > >>> > find_codes=:3 :0 NB. y is result of freqs >>> > b=. bl_count y >>> > v=. start_vals b >>> > n=. /:~ ~.y-.0 >>> > o=. ;({./.~ /:~ (</. i.@#)) y >>> > c=. ;<"1&.>n (([#2:) #: ])&.> (*b)#v+&.>i.&.>b >>> > c /: o >>> > ) >>> > >>> > An alternate version of the result from find_codes would be given by: >>> > >>> > def_code=:3 :0 >>> > b=. bl_count y >>> > v=. start_vals b >>> > n=. /:~ ~.y-.0 >>> > o=. ;({./.~ /:~ (</. i.@#)) y >>> > c=. ;n,.&.>(*b)#v+&.>i.&.>b >>> > (,. i.@#)c /: o >>> > ) >>> > >>> > The argument to find_codes or def_code is the bit widths for each symbol. >>> > >>> > I have not been able to figure out, from rfc 1951, how the bit widths >>> > are calculated. >>> > >>> > Thanks, >>> > >>> > -- >>> > Raul >>> > >>> > >>> > >>> > On Thu, Sep 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote: >>> > > bill, I'd be interested in a solution but I don't think I can >>> > > contribute any more on this. I played with >>> > > https://code.google.com/p/miniz/ and became even more convinced of the >>> > > complexity. It seems as though the compressor can decide whether to >>> > > include the dictionary code table or not -- likely based on the size >>> > > of the table. >>> > > >>> > > >>> > > http://tools.ietf.org/html/rfc1950 >>> > > >>> > > A preset dictionary is specially useful to compress short input >>> > > sequences. The compressor can take advantage of the dictionary >>> > > context to encode the input in a more compact manner. >>> > > >>> > > >>> > > More links for anyone who is following and cares to go down the rabbit >>> > hole too: >>> > > >>> > > http://en.wikipedia.org/wiki/Canonical_Huffman_code >>> > > >>> > > >>> > >>> http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree >>> > > >>> > > >>> > > >>> > > On Thu, Sep 11, 2014 at 1:28 PM, bill lam <[email protected]> wrote: >>> > >> This codes seemed invalid. >>> > >> >>> > >> 1 is a prefix of 11 which is a prefix of 111. Suppose there >>> > >> is a bit pattern of 1 1 , it is ambiguous to mean >>> > >> [68,'1'] [68,'1'] >>> > >> or [65,'11'] >>> > >> >>> > >> The huffman code in rfc is canonical meaning there is exactly one >>> > >> possible huffman codes for a given bit length vector. This is >>> > >> important because the huffman code table itself will not be >>> > >> stored inside the deflate stream. The decoder only gets the bit >>> > >> length vector, if encoder and decoder use different huffman code >>> > >> for the same bit length vectors, it will not work. >>> > >> >>> > >> Чт, 11 сен 2014, Joe Bogner написал(а): >>> > >>> Ignore the pako.js example output... It was just outputting the >>> binary >>> > >>> representation of A-Z, not the huffman code >>> > >>> >>> > >>> This is what I meant to send >>> > >>> >>> > >>> For ABCD: >>> > >>> >>> > >>> [65,'11'], >>> > >>> [66,'0'], >>> > >>> [67,'10'], >>> > >>> [68,'1'], >>> > >>> [256,'111'] >>> > >>> >>> > >>> It still doesn't seem to be sorting correctly lexographically, but >>> I'm >>> > >>> not really in my comfort zone of understanding: >>> > >>> >>> > >>> The RFC has this instead: >>> > >>> >>> > >>> Symbol Code >>> > >>> ------ ---- >>> > >>> A 10 >>> > >>> B 0 >>> > >>> C 110 >>> > >>> D 111 >>> > >>> >>> > >>> I don't really know if it has to match the RFC or if each >>> > >>> implementation is able to do its own thing as long since it includes >>> > >>> the distance/reverse lookup table (whatever it's called). >>> > >>> >>> > >>> FYI >>> > >>> >>> > >>> >>> > >>> This is where I inserted my code: >>> > >>> >>> > >>> /* >>> > >>> =========================================================================== >>> > >>> * Generate the codes for a given tree and bit counts (which need not >>> > be >>> > >>> * optimal). >>> > >>> * IN assertion: the array bl_count contains the bit length >>> statistics >>> > for >>> > >>> * the given tree and the field len is set for all tree elements. >>> > >>> * OUT assertion: the field code is set for all tree elements of non >>> > >>> * zero code length. >>> > >>> */ >>> > >>> function gen_codes(tree, max_code, bl_count) >>> > >>> // ct_data *tree; /* the tree to decorate */ >>> > >>> // int max_code; /* largest code with non zero >>> > frequency */ >>> > >>> // ushf *bl_count; /* number of codes at each bit >>> length >>> > */ >>> > >>> { >>> > >>> var next_code = new Array(MAX_BITS+1); /* next code value for each >>> > >>> bit length */ >>> > >>> var code = 0; /* running code value */ >>> > >>> var bits; /* bit index */ >>> > >>> var n; /* code index */ >>> > >>> >>> > >>> /* The distribution counts are first used to generate the code >>> values >>> > >>> * without bit reversal. >>> > >>> */ >>> > >>> for (bits = 1; bits <= MAX_BITS; bits++) { >>> > >>> next_code[bits] = code = (code + bl_count[bits-1]) << 1; >>> > >>> } >>> > >>> /* Check that the bit counts in bl_count are consistent. The last >>> > code >>> > >>> * must be all ones. >>> > >>> */ >>> > >>> //Assert (code + bl_count[MAX_BITS]-1 == (1<<MAX_BITS)-1, >>> > >>> // "inconsistent bit counts"); >>> > >>> //Tracev((stderr,"\ngen_codes: max_code %d ", max_code)); >>> > >>> >>> > >>> for (n = 0; n <= max_code; n++) { >>> > >>> var len = tree[n*2 + 1]/*.Len*/; >>> > >>> if (len === 0) { continue; } >>> > >>> /* Now reverse the bits */ >>> > >>> tree[n*2]/*.Code*/ = bi_reverse(next_code[len]++, len); >>> > >>> >>> > >>> if (tree!=static_ltree) { >>> > >>> var v = tree[n*2]; >>> > >>> console.log('[' + n + ",'" + v.toString(2) + "'],"); >>> > >>> } >>> > >>> //Tracecv(tree != static_ltree, (stderr,"\nn %3d %c l %2d c %4x >>> > (%x) ", >>> > >>> // n, (isgraph(n) ? n : ' '), len, tree[n].Code, >>> > next_code[len]-1)); >>> > >>> } >>> > >>> >>> > >>> } >>> > >>> >>> > >>> On Thu, Sep 11, 2014 at 11:55 AM, Joe Bogner <[email protected]> >>> > wrote: >>> > >>> > I think the prefix coding looks OK, but the 2 rules does not: >>> > >>> > >>> > >>> > I modified the code[1] to allow passing in a string and outputting >>> > the codes >>> > >>> > >>> > >>> > C:\temp>deflate ABCDEFGHIJKLMONPQRSTUVWXYZ >>> > >>> > code 65 : 0 00000000000000000000000000000000 >>> > >>> > code 66 : 6 00000000000000000000000000000110 >>> > >>> > code 67 : 8 00000000000000000000000000001000 >>> > >>> > code 68 : 4 00000000000000000000000000000100 >>> > >>> > code 69 : 22 00000000000000000000000000010110 >>> > >>> > code 70 : 14 00000000000000000000000000001110 >>> > >>> > code 71 : 30 00000000000000000000000000011110 >>> > >>> > code 72 : 1 00000000000000000000000000000001 >>> > >>> > code 73 : 17 00000000000000000000000000010001 >>> > >>> > code 74 : 12 00000000000000000000000000001100 >>> > >>> > code 75 : 9 00000000000000000000000000001001 >>> > >>> > code 76 : 25 00000000000000000000000000011001 >>> > >>> > code 77 : 5 00000000000000000000000000000101 >>> > >>> > code 78 : 21 00000000000000000000000000010101 >>> > >>> > code 79 : 13 00000000000000000000000000001101 >>> > >>> > code 80 : 29 00000000000000000000000000011101 >>> > >>> > code 81 : 3 00000000000000000000000000000011 >>> > >>> > code 82 : 19 00000000000000000000000000010011 >>> > >>> > code 83 : 11 00000000000000000000000000001011 >>> > >>> > code 84 : 27 00000000000000000000000000011011 >>> > >>> > code 85 : 7 00000000000000000000000000000111 >>> > >>> > code 86 : 23 00000000000000000000000000010111 >>> > >>> > code 87 : 15 00000000000000000000000000001111 >>> > >>> > code 88 : 31 00000000000000000000000000011111 >>> > >>> > code 89 : 2 00000000000000000000000000000010 >>> > >>> > code 90 : 10 00000000000000000000000000001010 >>> > >>> > >>> > >>> > >>> > >>> > I think it violates the consecutive rule... Each letter has the >>> same >>> > >>> > frequency. ABCD have the same bit length. The order is off: >>> > >>> > >>> > >>> > If I sort it lexographically using javascript: >>> > >>> > >>> > >>> > JSON.stringify([['a','00000000000000000000000000000000'], >>> > >>> > ['b','00000000000000000000000000000110'], >>> > >>> > ['c','00000000000000000000000000001000'], >>> > >>> > ['d','00000000000000000000000000000100']].sort(function(x,y) { >>> return >>> > >>> > x[1] - y[1] })) >>> > >>> > >>> > >>> > >>> > >>> "[["a","00000000000000000000000000000000"],["d","00000000000000000000000000000100"],["b","00000000000000000000000000000110"],["c","00000000000000000000000000001000"]]" >>> > >>> > >>> > >>> > As you can see, the order comes out a,d,b,c >>> > >>> > >>> > >>> > I played around with a javascript implementation, pako[2]. It seems >>> > to >>> > >>> > work correctly: >>> > >>> > >>> > >>> > As you can see, it sorts lexographically >>> > >>> > >>> > >>> > JSON.stringify([[65,'1000001'], >>> > >>> > [66,'1000010'], >>> > >>> > [67,'1000011'], >>> > >>> > [68,'1000100'], >>> > >>> > [69,'1000101'], >>> > >>> > [70,'1000110'], >>> > >>> > [71,'1000111'], >>> > >>> > [72,'1001000'], >>> > >>> > [73,'1001001'], >>> > >>> > [74,'1001010'], >>> > >>> > [75,'1001011'], >>> > >>> > [76,'1001100'], >>> > >>> > [77,'1001101'], >>> > >>> > [78,'1001110'], >>> > >>> > [79,'1001111'], >>> > >>> > [80,'1010000'], >>> > >>> > [81,'1010001'], >>> > >>> > [82,'1010010'], >>> > >>> > [83,'1010011'], >>> > >>> > [84,'1010100'], >>> > >>> > [85,'1010101'], >>> > >>> > [86,'1010110'], >>> > >>> > [87,'1010111'], >>> > >>> > [88,'1011000'], >>> > >>> > [89,'1011001'], >>> > >>> > [90,'1011010']].sort(function(x,y) { return x[1] - y[1] })) >>> > >>> > >>> > >>> > >>> > >>> "[[65,"1000001"],[66,"1000010"],[67,"1000011"],[68,"1000100"],[69,"1000101"],[70,"1000110"],[71,"1000111"],[72,"1001000"],[73,"1001001"],[74,"1001010"],[75,"1001011"],[76,"1001100"],[77,"1001101"],[78,"1001110"],[79,"1001111"],[80,"1010000"],[81,"1010001"],[82,"1010010"],[83,"1010011"],[84,"1010100"],[85,"1010101"],[86,"1010110"],[87,"1010111"],[88,"1011000"],[89,"1011001"],[90,"1011010"]]" >>> > >>> > >>> > >>> > All the values are sorted correctly. >>> > >>> > >>> > >>> > Here it is with the same ABCD example: >>> > >>> > >>> > >>> > var pako = require('pako'); >>> > >>> > var binaryString = pako.deflate('ABCD', { to: 'string' }); >>> > >>> > console.log(binaryString); >>> > >>> > var restored = pako.inflate(binaryString, { to: 'string' }); >>> > >>> > console.log(restored); >>> > >>> > >>> > >>> > It successfully deflates and inflates itself >>> > >>> > >>> > >>> > x?♣A☺☺ ? mcÿ7♣A♫☻?☺♂ >>> > >>> > ABCD >>> > >>> > >>> > >>> > >>> > >>> > Hope this helps... >>> > >>> > >>> > >>> > [1] - >>> > https://gist.github.com/joebo/a3c2932f0e5a7a0c3f07#file-deflate-c-L2613 >>> > >>> > [2] - https://rawgit.com/nodeca/pako/master/dist/pako.js >>> > >>> > >>> > >>> > On Thu, Sep 11, 2014 at 11:33 AM, bill lam <[email protected]> >>> > wrote: >>> > >>> >> This is strange since every author must had decode its own encoded >>> > >>> >> data as a smoke test. >>> > >>> >> >>> > >>> >> Did you test if huffman code or bit lengths it produced was >>> > >>> >> correct or not, ie it is a prefix coding and it satisfy the 2 >>> > >>> >> rules in rfc. >>> > >>> >> >>> > >>> >> Чт, 11 сен 2014, Joe Bogner написал(а): >>> > >>> >>> unfortunately the dynamic coding in the putty fork doesn't seem >>> to >>> > work: >>> > >>> >>> >>> > >>> >>> deflate -c deflate.c > out >>> > >>> >>> deflate -d out >>> > >>> >>> >>> > >>> >>> decoding error: incorrect data checksum >>> > >>> >>> >>> > >>> >>> >>> > >>> >>> it works fine with static tables >>> > >>> >>> >>> > >>> >>> C:\temp>echo ABCD > ABCD >>> > >>> >>> >>> > >>> >>> C:\temp>deflate -c ABCD > out >>> > >>> >>> >>> > >>> >>> C:\temp>deflate -d out >>> > >>> >>> ABCD >>> > >>> >>> >>> > >>> >>> I added some debugging code to determine that deflating deflate.c >>> > >>> >>> would be a dynamic table... Assuming it's broke, I probably >>> > wouldn't >>> > >>> >>> use it as a reference implementation after all >>> > >>> >>> >>> > >>> >>> On Thu, Sep 11, 2014 at 3:45 AM, bill lam <[email protected]> >>> > wrote: >>> > >>> >>> > the frequencies (guessing from bit lengths) should be something >>> > like 2 3 1 1 >>> > >>> >>> > (2 3 1 1) hcodes 'ABCD' >>> > >>> >>> > >>> > >>> >>> > the hard part is the inverse problem: how to get the huffman >>> > code with >>> > >>> >>> > prior knowing the bits for each symbol. Your pointer to the >>> > putty >>> > >>> >>> > fork looks like helpful. The comment is in lines 861 to 914, >>> > the code >>> > >>> >>> > itself in line 915 to 964. Do you know how to express it in J? >>> > >>> >>> > Thanks. >>> > >>> >>> > >>> > >>> >>> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner < >>> [email protected]> >>> > wrote: >>> > >>> >>> >> Here a few other links ... after reading through the RFC. Not >>> > sure if >>> > >>> >>> >> they help, but just sharing from my own research into >>> assisting >>> > on >>> > >>> >>> >> this topic >>> > >>> >>> >> >>> > >>> >>> >> https://github.com/evegard/pngview/blob/master/huffman.c#L54 >>> > >>> >>> >> >>> > >>> >>> >> And a fork of the putty version with dynamic huffman coding: >>> > >>> >>> >> >>> > >>> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup >>> > >>> >>> >> >>> > >>> >>> >> Or just generally googling some of the code from the RFC: >>> > >>> >>> >> >>> > >>> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20 >>> > >>> >>> >> >>> > >>> >>> >> >>> > >>> >>> >> Using the code from >>> > >>> >>> >> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got >>> > stuck >>> > >>> >>> >> trying to match a simple example to the binary tree in the >>> RFC: >>> > >>> >>> >> >>> > >>> >>> >> From the RFC: >>> > >>> >>> >> >>> > >>> >>> >> /\ Symbol Code >>> > >>> >>> >> 0 1 ------ ---- >>> > >>> >>> >> / \ A 00 >>> > >>> >>> >> /\ B B 1 >>> > >>> >>> >> 0 1 C 011 >>> > >>> >>> >> / \ D 010 >>> > >>> >>> >> A /\ >>> > >>> >>> >> 0 1 >>> > >>> >>> >> / \ >>> > >>> >>> >> D C >>> > >>> >>> >> >>> > >>> >>> >> >>> > >>> >>> >> >>> > >>> >>> >> (4#1) hcodes 'ABCD' >>> > >>> >>> >> ┌───┬───┬───┬───┐ >>> > >>> >>> >> │0 0│0 1│1 0│1 1│ >>> > >>> >>> >> └───┴───┴───┴───┘ >>> > >>> >>> >> >>> > >>> >>> >> Per the RFC, ideally that should match this? >>> > '00';'1';'011';'010' >>> > >>> >>> >> >>> > >>> >>> >> >>> > >>> >>> >> From there, it seems like a pretty straightforward exercise to >>> > >>> >>> >> transliterate the C code from the RFC into J code to recode >>> the >>> > >>> >>> >> example to: >>> > >>> >>> >> >>> > >>> >>> >> >>> > >>> >>> >> Symbol Code >>> > >>> >>> >> ------ ---- >>> > >>> >>> >> A 10 >>> > >>> >>> >> B 0 >>> > >>> >>> >> C 110 >>> > >>> >>> >> D 111 >>> > >>> >>> >> >>> > >>> >>> >> >>> > >>> >>> >> I would probably start with a looping construct like what's in >>> > the RFC >>> > >>> >>> >> and then figure out a more J way to do it, but first I would >>> > need to >>> > >>> >>> >> figure out how to create the binary tree in that initial >>> format. >>> > >>> >>> >> >>> > >>> >>> >> On Wed, Sep 10, 2014 at 7:41 PM, bill lam < >>> [email protected]> >>> > wrote: >>> > >>> >>> >>> Thanks Joe, >>> > >>> >>> >>> putty only use zlib static huffman for encoding so that it >>> > does not build >>> > >>> >>> >>> any huffman dictionary table. >>> > >>> >>> >>> >>> > >>> >>> >>> The zlib static huffman code does not care about individual >>> > symbol's >>> > >>> >>> >>> frequency, it just encode 0 to 286 into bits, see section >>> > 3.2.6. >>> > >>> >>> >>> On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]> >>> > wrote: >>> > >>> >>> >>> >>> > >>> >>> >>>> You've already likely considered this, but if it were me I >>> > would compare >>> > >>> >>> >>>> results to a working implementation. The one from putty >>> seems >>> > pretty clean >>> > >>> >>> >>>> and standalone: >>> > >>> >>> >>>> >>> > >>> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C >>> > >>> >>> >>>> . I was able to compile it on windows no problem and I >>> assume >>> > it'd be fine >>> > >>> >>> >>>> on linux as well. >>> > >>> >>> >>>> >>> > >>> >>> >>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller < >>> > [email protected]> >>> > >>> >>> >>>> wrote: >>> > >>> >>> >>>> >>> > >>> >>> >>>> > I think the use of the term "consecutive" rather than >>> > "sequential" is >>> > >>> >>> >>>> > telling. >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > The described algorithm is: compute the huffman code >>> > lengths: >>> > >>> >>> >>>> > #@>F1 hcodes A1 >>> > >>> >>> >>>> > 1 3 7 7 6 6 6 6 6 6 6 2 >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > Then assign ascending huffman codes first in length order >>> > and then >>> > >>> >>> >>>> > within codes of the same length. >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > Taken literally, that might be something like this: >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > H=: 4 :0 >>> > >>> >>> >>>> > L=.#@> x hcodes y >>> > >>> >>> >>>> > U=.~.L >>> > >>> >>> >>>> > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L >>> > >>> >>> >>>> > ) >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > ":@>F1 H A1 >>> > >>> >>> >>>> > 0 >>> > >>> >>> >>>> > 1 1 0 >>> > >>> >>> >>>> > 1 1 1 0 0 1 0 >>> > >>> >>> >>>> > 1 1 1 0 0 1 1 >>> > >>> >>> >>>> > 1 1 1 0 0 0 >>> > >>> >>> >>>> > 1 1 1 0 0 1 >>> > >>> >>> >>>> > 1 1 1 0 1 0 >>> > >>> >>> >>>> > 1 1 1 0 1 1 >>> > >>> >>> >>>> > 1 1 1 1 0 0 >>> > >>> >>> >>>> > 1 1 1 1 0 1 >>> > >>> >>> >>>> > 1 1 1 1 1 0 >>> > >>> >>> >>>> > 1 0 >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > But is this correct? Is it actually safe to leave the >>> > results like >>> > >>> >>> >>>> > this - with all codes of the same length being consecutive >>> > to each >>> > >>> >>> >>>> > other? >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > F (hcodes -:&:(#@>) H) A >>> > >>> >>> >>>> > 0 >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > No. >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > So... "consecutive" must refer only to the values used and >>> > not their >>> > >>> >>> >>>> > order within the result. >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > Perhaps something like this: >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > deflatecodes=:4 :0 >>> > >>> >>> >>>> > L=.#@> x hcodes y >>> > >>> >>> >>>> > U=.~.L >>> > >>> >>> >>>> > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L >>> > >>> >>> >>>> > R/:;(</. i.@#)L >>> > >>> >>> >>>> > ) >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > F (hcodes -:&:(#@>) deflatecodes) A >>> > >>> >>> >>>> > 1 >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > There should be a better way of doing this, but this >>> should >>> > at least >>> > >>> >>> >>>> > get you started. >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > Thanks, >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > -- >>> > >>> >>> >>>> > Raul >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > >>> > >>> >>> >>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam < >>> > [email protected]> wrote: >>> > >>> >>> >>>> > > For huffman coding used in zlib: >>> > >>> >>> >>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2. >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > The Huffman codes used for each alphabet in the >>> "deflate" >>> > >>> >>> >>>> > > format have two additional rules: >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > * All codes of a given bit length have >>> lexicographically >>> > >>> >>> >>>> > > consecutive values, in the same order as the symbols >>> > >>> >>> >>>> > > they represent; >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > * Shorter codes lexicographically precede longer >>> codes. >>> > >>> >>> >>>> > > I tried jwiki hcodes in >>> > >>> >>> >>>> > > I try Roger's essay >>> > >>> >>> >>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > hc=: 4 : 0 >>> > >>> >>> >>>> > > if. 1=#x do. y >>> > >>> >>> >>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. >>> > 2{./:x end. >>> > >>> >>> >>>> > > ) >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > hcodes=: 4 : 0 >>> > >>> >>> >>>> > > assert. x -:&$ y NB. weights and words have >>> > same shape >>> > >>> >>> >>>> > > assert. (0<:x) *. 1=#$x NB. weights are non-negative >>> > >>> >>> >>>> > > assert. 1 >: L.y NB. words are boxed not more >>> > than once >>> > >>> >>> >>>> > > w=. ,&.> y NB. standardized words >>> > >>> >>> >>>> > > assert. w -: ~.w NB. words are unique >>> > >>> >>> >>>> > > t=. 0 {:: x hc w NB. minimal weight binary >>> tree >>> > >>> >>> >>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t >>> > >>> >>> >>>> > > ) >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > but the coding produced is malformed for zlib. eg, >>> > >>> >>> >>>> > > this is what I ran into trouble >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > f1=: 1 256 17 1 1 9 1 >>> > >>> >>> >>>> > > f2=: 2 1 0 1 255 0 1536 >>> > >>> >>> >>>> > > F=: ,/(f1#f2) >>> > >>> >>> >>>> > > A=: i.286 >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > F hcodes A >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > Or a shorter example >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > A1=: i.12 >>> > >>> >>> >>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1 >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > F1 hcodes A1 >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > Any idea? >>> > >>> >>> >>>> > > >>> > >>> >>> >>>> > > -- >>> > >>> >>> >>>> > > regards, >>> > >>> >>> >>>> > > ==================================================== >>> > >>> >>> >>>> > > GPG key 1024D/4434BAB3 2008-08-24 >>> > >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>> > >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --armor --export >>> 4434BAB3 >>> > >>> >>> >>>> > > >>> > ---------------------------------------------------------------------- >>> > >>> >>> >>>> > > For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >>> >>>> > >>> > ---------------------------------------------------------------------- >>> > >>> >>> >>>> > For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >>> >>>> > >>> > >>> >>> >>>> >>> > ---------------------------------------------------------------------- >>> > >>> >>> >>>> For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >>> >>>> >>> > >>> >>> >>> >>> > ---------------------------------------------------------------------- >>> > >>> >>> >>> For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >>> >> >>> > ---------------------------------------------------------------------- >>> > >>> >>> >> For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >>> > >>> > ---------------------------------------------------------------------- >>> > >>> >>> > For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >>> >>> > ---------------------------------------------------------------------- >>> > >>> >>> For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >> >>> > >>> >> -- >>> > >>> >> regards, >>> > >>> >> ==================================================== >>> > >>> >> GPG key 1024D/4434BAB3 2008-08-24 >>> > >>> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>> > >>> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>> > >>> >> >>> > ---------------------------------------------------------------------- >>> > >>> >> For information about J forums see >>> > http://www.jsoftware.com/forums.htm >>> > >>> >>> ---------------------------------------------------------------------- >>> > >>> For information about J forums see >>> http://www.jsoftware.com/forums.htm >>> > >> >>> > >> -- >>> > >> regards, >>> > >> ==================================================== >>> > >> GPG key 1024D/4434BAB3 2008-08-24 >>> > >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>> > >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>> > >> ---------------------------------------------------------------------- >>> > >> For information about J forums see >>> http://www.jsoftware.com/forums.htm >>> > > ---------------------------------------------------------------------- >>> > > For information about J forums see http://www.jsoftware.com/forums.htm >>> > ---------------------------------------------------------------------- >>> > For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
