Re: [Jprogramming] LZ77

robert therriault Thu, 18 Sep 2014 08:30:57 -0700

The best form of self defence in a bar fight is to have left the bar 10 minutes 
before the fight starts. I learned that in Tai Chi and I think software patent 
law is a lot like a bar fight. ;-)


cheers, bob

On Sep 18, 2014, at 8:11 AM, Raul Miller <[email protected]> wrote:

> It's unlikely that most of those patents have merit.
> 
> First, there's prior art (the LZ77 patent, for example - which is now
> expired). And second, patents do not cover variations which would have
> been obvious to someone with ordinary skill in the art before the
> filing date of the patent.
> 
> In the context of a decompression routine written in J, anything
> described by the LZ77 patent must be obvious in the context of patents
> filed later. And, since J is a variant on APL, any approaches that
> were obvious to APL programmers back in the day would also be
> considered obvious (for example), but also anything published in prior
> centuries could also be considered evidence for obviousness.
> 
> Changing the width of a window from 3 to 4? Obvious. Many grade school
> kids are capable of doing that. Using an array instead of a linked
> list? Obvious - that's what APL programmers do. Using an APL primitive
> rather than some elaborate system of code? Obvious.
> 
> Anyways, if someone tries to sue us on patent grounds we have a lot of
> options for how to respond. We could, for example, start a kickstarter
> project to document sufficient prior art to refute the patent.
> Probably the first step, though, would be to read the claims of the
> asserted patents and see what we can find for prior art for any of the
> relevant claims.
> 
> Generally speaking the claims will either be so narrow that they do
> not apply, or so broad that they do apply but also apply to prior art.
> 
> And, generally speaking, the claims will be asserted by lawyers with
> no background in coding and no understanding of what the claims mean.
> They'll typically have "experts" who back them up, who are far less
> competent - at least in the context of what's obvious to a J
> programmer - than the typical J programmer.
> 
> Honestly, the biggest risk is probably falling asleep while reading
> the legal "threats".
> 
> Thanks,
> 
> -- 
> Raul
> 
> 
> 
> On Thu, Sep 18, 2014 at 6:09 AM, bill lam <[email protected]> wrote:
>> Thanks. This is not urgent because zlib.so should usually
>> available.
>> 
>> Also as stated in rfc, many variation of LZ77 were patented so
>> it suggested implementors to follow the guidelines (3 bytes at a
>> time, hash etc) which is patent free.  If the improvement in
>> speed is signaficant then we may ignore its guideline.  I guess
>> there would not be any revenue in sueing Jsoftware for damage.
>> 
>> Чт, 18 сен 2014, Raul Miller написал(а):
>>> I can try taking a look at the code, one of these days, if you want me to.
>>> 
>>> (Not right this second though.)
>>> 
>>> Thanks,
>>> 
>>> --
>>> Raul
>>> 
>>> 
>>> On Thu, Sep 18, 2014 at 5:45 AM, bill lam <[email protected]> wrote:
>>>> Thanks Raul,
>>>> 
>>>> I followed you suggestion to use 2 generations of hash vector,
>>>> Its speed is faster now although still far behind zlib.so.
>>>> 
>>>> Ср, 17 сен 2014, Raul Miller написал(а):
>>>>> I would ignore the stuff about "hashes" and "hash chains" and "singly
>>>>> linked". Hashes are nice sometimes but other times they're just a
>>>>> disaster. Sometimes that's better than the alternative, but their
>>>>> marketing is sometimes significantly better than their actual
>>>>> behavior.
>>>>> 
>>>>> Anyways, what's really happening here is an exact match search and
>>>>> some kind of constraint on the size of the table.
>>>>> 
>>>>> I think the best way to approach that in J is to search in sorted
>>>>> lists. The problem here is the incremental update. You don't want to
>>>>> re-sort a large list every time you add something to the list.
>>>>> 
>>>>> The way to approach that, I think, is to have several generations of
>>>>> lists, with each larger than the last, and each slower to be update
>>>>> than the last.
>>>>> 
>>>>> So, for example, have on list which is unsorted. Use i. to look things
>>>>> up in it, and merge it into the next list when it gets too long (maybe
>>>>> when it has 30 items in it).
>>>>> 
>>>>> The next list gets sorted every time it gets stuff added to it. Use I.
>>>>> to look things up in it. Merge it into the next list when it gets too
>>>>> long (maybe when it has 1000 items in it).
>>>>> 
>>>>> Same for the next list, but this one, when it gets too long, you
>>>>> arbitrarily discard items from it.
>>>>> 
>>>>> Now the other thing that's happening, here, is that J doesn't really
>>>>> do all that well with incremental update algorithms. You'll still be
>>>>> slow if you do things "one group of three" at a time. But -
>>>>> hypothetically at least - you don't have to. The ideal would be to
>>>>> come up with an algorithm for a block of text which complies (at least
>>>>> in terms of data structures) with the rfc and which behaves reasonably
>>>>> close to the behavior of the rfc described behavior.
>>>>> 
>>>>> But having a slow implementation that works is a really important
>>>>> step. That (with some test cases - like the one you described here)
>>>>> gives you a baseline to test against.
>>>>> 
>>>>> Hopefully this helps?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> --
>>>>> Raul
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Sep 17, 2014 at 5:23 AM, bill lam <[email protected]> wrote:
>>>>>> The LZ77 variation used in deflate is at the bottom (copied from rfc
>>>>>> 1951).  I tried to translate it into J using a hash function which
>>>>>> interprets XYZ as the lower 3 bytes of a 4 byte integer.  My J
>>>>>> implementation worked but efficient is rather poor.  I think the
>>>>>> inefficiency of my implementation came from the need to
>>>>>> 1. build a vector of pre-computed hash of size equal to sliding
>>>>>> window. (although the hash of all bytes had been pre-computed)
>>>>>> 2. find the last index of the hash of current XYZ in the vector.
>>>>>> (since the left argument changes, special code of the form m&i: does not 
>>>>>> apply)
>>>>>> 
>>>>>> If data are truly random, the chance of matching would be very small,
>>>>>> thus the time spent should roughly the sum of the above 1 & 2 for
>>>>>> every byte, however it took a very long time even for data size of
>>>>>> only about 500KB.  Any idea for a better J implementation.
>>>>>> 
>>>>>> NB. for deflate, minimum length for match is 3, max length is 285,
>>>>>> maximum distance is 32K.
>>>>>> 
>>>>>> eg. LZ77 compression of the input 500#'abc'
>>>>>> is
>>>>>> aa(285,1)(213,1)b(285,1)(214,1)c(285,1)(214,1)
>>>>>> 
>>>>>> inside each bracket is (length,distance) pair applied to the  history
>>>>>> buffer. Untested but should be something like this.
>>>>>> 
>>>>>> (text from rfc)
>>>>>> The compressor uses a chained hash table to find duplicated strings,
>>>>>>   using a hash function that operates on 3-byte sequences.  At any
>>>>>>   given point during compression, let XYZ be the next 3 input bytes to
>>>>>>   be examined (not necessarily all different, of course).  First, the
>>>>>>   compressor examines the hash chain for XYZ.  If the chain is empty,
>>>>>>   the compressor simply writes out X as a literal byte and advances one
>>>>>>   byte in the input.  If the hash chain is not empty, indicating that
>>>>>>   the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
>>>>>>   same hash function value) has occurred recently, the compressor
>>>>>>   compares all strings on the XYZ hash chain with the actual input data
>>>>>>   sequence starting at the current point, and selects the longest
>>>>>>   match.
>>>>>> 
>>>>>>   The compressor searches the hash chains starting with the most recent
>>>>>>   strings, to favor small distances and thus take advantage of the
>>>>>>   Huffman encoding.  The hash chains are singly linked. There are no
>>>>>>   deletions from the hash chains; the algorithm simply discards matches
>>>>>>   that are too old.  To avoid a worst-case situation, very long hash
>>>>>>   chains are arbitrarily truncated at a certain length, determined by a
>>>>>>   run-time parameter.
>>>>>> ----------------------------------------------------------------------
>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> 
>>>> --
>>>> regards,
>>>> ====================================================
>>>> GPG key 1024D/4434BAB3 2008-08-24
>>>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>>>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> 
>> --
>> regards,
>> ====================================================
>> GPG key 1024D/4434BAB3 2008-08-24
>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] LZ77

Reply via email to