AFAIK whether zlib/deflate itself is patent-free is still an unknown, because it has not yet trial tested.
patent for LZW expired but other variations of LZ77 might not. Чт, 18 сен 2014, robert therriault написал(а): > Hi Bill > > Nice joke in your last two sentences. > > You wouldn't want to do this on the chance that J ends up with revenue and > could become a target (especially if your joke were produced as evidence). > Legal action ends up being very expensive to defend in time as well as money. > Also, things tend to work better if we play nice and it sounds like they have > been nice enough to provide a way to avoid the patent issue. > > Just my two bits, but your joke set off my program manager alarm bells. :-) > > cheers, bob > > On Sep 18, 2014, at 3:09 AM, bill lam <[email protected]> wrote: > > > Thanks. This is not urgent because zlib.so should usually > > available. > > > > Also as stated in rfc, many variation of LZ77 were patented so > > it suggested implementors to follow the guidelines (3 bytes at a > > time, hash etc) which is patent free. If the improvement in > > speed is signaficant then we may ignore its guideline. I guess > > there would not be any revenue in sueing Jsoftware for damage. > > > > Чт, 18 сен 2014, Raul Miller написал(а): > >> I can try taking a look at the code, one of these days, if you want me to. > >> > >> (Not right this second though.) > >> > >> Thanks, > >> > >> -- > >> Raul > >> > >> > >> On Thu, Sep 18, 2014 at 5:45 AM, bill lam <[email protected]> wrote: > >>> Thanks Raul, > >>> > >>> I followed you suggestion to use 2 generations of hash vector, > >>> Its speed is faster now although still far behind zlib.so. > >>> > >>> Ср, 17 сен 2014, Raul Miller написал(а): > >>>> I would ignore the stuff about "hashes" and "hash chains" and "singly > >>>> linked". Hashes are nice sometimes but other times they're just a > >>>> disaster. Sometimes that's better than the alternative, but their > >>>> marketing is sometimes significantly better than their actual > >>>> behavior. > >>>> > >>>> Anyways, what's really happening here is an exact match search and > >>>> some kind of constraint on the size of the table. > >>>> > >>>> I think the best way to approach that in J is to search in sorted > >>>> lists. The problem here is the incremental update. You don't want to > >>>> re-sort a large list every time you add something to the list. > >>>> > >>>> The way to approach that, I think, is to have several generations of > >>>> lists, with each larger than the last, and each slower to be update > >>>> than the last. > >>>> > >>>> So, for example, have on list which is unsorted. Use i. to look things > >>>> up in it, and merge it into the next list when it gets too long (maybe > >>>> when it has 30 items in it). > >>>> > >>>> The next list gets sorted every time it gets stuff added to it. Use I. > >>>> to look things up in it. Merge it into the next list when it gets too > >>>> long (maybe when it has 1000 items in it). > >>>> > >>>> Same for the next list, but this one, when it gets too long, you > >>>> arbitrarily discard items from it. > >>>> > >>>> Now the other thing that's happening, here, is that J doesn't really > >>>> do all that well with incremental update algorithms. You'll still be > >>>> slow if you do things "one group of three" at a time. But - > >>>> hypothetically at least - you don't have to. The ideal would be to > >>>> come up with an algorithm for a block of text which complies (at least > >>>> in terms of data structures) with the rfc and which behaves reasonably > >>>> close to the behavior of the rfc described behavior. > >>>> > >>>> But having a slow implementation that works is a really important > >>>> step. That (with some test cases - like the one you described here) > >>>> gives you a baseline to test against. > >>>> > >>>> Hopefully this helps? > >>>> > >>>> Thanks, > >>>> > >>>> -- > >>>> Raul > >>>> > >>>> > >>>> > >>>> > >>>> On Wed, Sep 17, 2014 at 5:23 AM, bill lam <[email protected]> wrote: > >>>>> The LZ77 variation used in deflate is at the bottom (copied from rfc > >>>>> 1951). I tried to translate it into J using a hash function which > >>>>> interprets XYZ as the lower 3 bytes of a 4 byte integer. My J > >>>>> implementation worked but efficient is rather poor. I think the > >>>>> inefficiency of my implementation came from the need to > >>>>> 1. build a vector of pre-computed hash of size equal to sliding > >>>>> window. (although the hash of all bytes had been pre-computed) > >>>>> 2. find the last index of the hash of current XYZ in the vector. > >>>>> (since the left argument changes, special code of the form m&i: does > >>>>> not apply) > >>>>> > >>>>> If data are truly random, the chance of matching would be very small, > >>>>> thus the time spent should roughly the sum of the above 1 & 2 for > >>>>> every byte, however it took a very long time even for data size of > >>>>> only about 500KB. Any idea for a better J implementation. > >>>>> > >>>>> NB. for deflate, minimum length for match is 3, max length is 285, > >>>>> maximum distance is 32K. > >>>>> > >>>>> eg. LZ77 compression of the input 500#'abc' > >>>>> is > >>>>> aa(285,1)(213,1)b(285,1)(214,1)c(285,1)(214,1) > >>>>> > >>>>> inside each bracket is (length,distance) pair applied to the history > >>>>> buffer. Untested but should be something like this. > >>>>> > >>>>> (text from rfc) > >>>>> The compressor uses a chained hash table to find duplicated strings, > >>>>> using a hash function that operates on 3-byte sequences. At any > >>>>> given point during compression, let XYZ be the next 3 input bytes to > >>>>> be examined (not necessarily all different, of course). First, the > >>>>> compressor examines the hash chain for XYZ. If the chain is empty, > >>>>> the compressor simply writes out X as a literal byte and advances one > >>>>> byte in the input. If the hash chain is not empty, indicating that > >>>>> the sequence XYZ (or, if we are unlucky, some other 3 bytes with the > >>>>> same hash function value) has occurred recently, the compressor > >>>>> compares all strings on the XYZ hash chain with the actual input data > >>>>> sequence starting at the current point, and selects the longest > >>>>> match. > >>>>> > >>>>> The compressor searches the hash chains starting with the most recent > >>>>> strings, to favor small distances and thus take advantage of the > >>>>> Huffman encoding. The hash chains are singly linked. There are no > >>>>> deletions from the hash chains; the algorithm simply discards matches > >>>>> that are too old. To avoid a worst-case situation, very long hash > >>>>> chains are arbitrarily truncated at a certain length, determined by a > >>>>> run-time parameter. > >>>>> ---------------------------------------------------------------------- > >>>>> For information about J forums see http://www.jsoftware.com/forums.htm > >>>> ---------------------------------------------------------------------- > >>>> For information about J forums see http://www.jsoftware.com/forums.htm > >>> > >>> -- > >>> regards, > >>> ==================================================== > >>> GPG key 1024D/4434BAB3 2008-08-24 > >>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > >>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > > > > -- > > regards, > > ==================================================== > > GPG key 1024D/4434BAB3 2008-08-24 > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm -- regards, ==================================================== GPG key 1024D/4434BAB3 2008-08-24 gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
