Re: [Jprogramming] LZ77

robert therriault Thu, 18 Sep 2014 08:40:19 -0700

Unintended but not unrelated. :-)  

And now I think that patent law has already sucked up enough space in this 
forum on this fine morning, I now return to my regular 'programming' 
(intended). :-)


cheers, bob

On Sep 18, 2014, at 8:31 AM, Raul Miller <[email protected]> wrote:

> Pun intended?
> 
> http://www.americanbar.org/aba.html
> 
> Thanks,
> 
> -- 
> Raul
> 
> 
> On Thu, Sep 18, 2014 at 11:22 AM, robert therriault
> <[email protected]> wrote:
>> The best form of self defence in a bar fight is to have left the bar 10 
>> minutes before the fight starts. I learned that in Tai Chi and I think 
>> software patent law is a lot like a bar fight. ;-)
>> 
>> cheers, bob
>> 
>> On Sep 18, 2014, at 8:11 AM, Raul Miller <[email protected]> wrote:
>> 
>>> It's unlikely that most of those patents have merit.
>>> 
>>> First, there's prior art (the LZ77 patent, for example - which is now
>>> expired). And second, patents do not cover variations which would have
>>> been obvious to someone with ordinary skill in the art before the
>>> filing date of the patent.
>>> 
>>> In the context of a decompression routine written in J, anything
>>> described by the LZ77 patent must be obvious in the context of patents
>>> filed later. And, since J is a variant on APL, any approaches that
>>> were obvious to APL programmers back in the day would also be
>>> considered obvious (for example), but also anything published in prior
>>> centuries could also be considered evidence for obviousness.
>>> 
>>> Changing the width of a window from 3 to 4? Obvious. Many grade school
>>> kids are capable of doing that. Using an array instead of a linked
>>> list? Obvious - that's what APL programmers do. Using an APL primitive
>>> rather than some elaborate system of code? Obvious.
>>> 
>>> Anyways, if someone tries to sue us on patent grounds we have a lot of
>>> options for how to respond. We could, for example, start a kickstarter
>>> project to document sufficient prior art to refute the patent.
>>> Probably the first step, though, would be to read the claims of the
>>> asserted patents and see what we can find for prior art for any of the
>>> relevant claims.
>>> 
>>> Generally speaking the claims will either be so narrow that they do
>>> not apply, or so broad that they do apply but also apply to prior art.
>>> 
>>> And, generally speaking, the claims will be asserted by lawyers with
>>> no background in coding and no understanding of what the claims mean.
>>> They'll typically have "experts" who back them up, who are far less
>>> competent - at least in the context of what's obvious to a J
>>> programmer - than the typical J programmer.
>>> 
>>> Honestly, the biggest risk is probably falling asleep while reading
>>> the legal "threats".
>>> 
>>> Thanks,
>>> 
>>> --
>>> Raul
>>> 
>>> 
>>> 
>>> On Thu, Sep 18, 2014 at 6:09 AM, bill lam <[email protected]> wrote:
>>>> Thanks. This is not urgent because zlib.so should usually
>>>> available.
>>>> 
>>>> Also as stated in rfc, many variation of LZ77 were patented so
>>>> it suggested implementors to follow the guidelines (3 bytes at a
>>>> time, hash etc) which is patent free.  If the improvement in
>>>> speed is signaficant then we may ignore its guideline.  I guess
>>>> there would not be any revenue in sueing Jsoftware for damage.
>>>> 
>>>> Чт, 18 сен 2014, Raul Miller написал(а):
>>>>> I can try taking a look at the code, one of these days, if you want me to.
>>>>> 
>>>>> (Not right this second though.)
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> --
>>>>> Raul
>>>>> 
>>>>> 
>>>>> On Thu, Sep 18, 2014 at 5:45 AM, bill lam <[email protected]> wrote:
>>>>>> Thanks Raul,
>>>>>> 
>>>>>> I followed you suggestion to use 2 generations of hash vector,
>>>>>> Its speed is faster now although still far behind zlib.so.
>>>>>> 
>>>>>> Ср, 17 сен 2014, Raul Miller написал(а):
>>>>>>> I would ignore the stuff about "hashes" and "hash chains" and "singly
>>>>>>> linked". Hashes are nice sometimes but other times they're just a
>>>>>>> disaster. Sometimes that's better than the alternative, but their
>>>>>>> marketing is sometimes significantly better than their actual
>>>>>>> behavior.
>>>>>>> 
>>>>>>> Anyways, what's really happening here is an exact match search and
>>>>>>> some kind of constraint on the size of the table.
>>>>>>> 
>>>>>>> I think the best way to approach that in J is to search in sorted
>>>>>>> lists. The problem here is the incremental update. You don't want to
>>>>>>> re-sort a large list every time you add something to the list.
>>>>>>> 
>>>>>>> The way to approach that, I think, is to have several generations of
>>>>>>> lists, with each larger than the last, and each slower to be update
>>>>>>> than the last.
>>>>>>> 
>>>>>>> So, for example, have on list which is unsorted. Use i. to look things
>>>>>>> up in it, and merge it into the next list when it gets too long (maybe
>>>>>>> when it has 30 items in it).
>>>>>>> 
>>>>>>> The next list gets sorted every time it gets stuff added to it. Use I.
>>>>>>> to look things up in it. Merge it into the next list when it gets too
>>>>>>> long (maybe when it has 1000 items in it).
>>>>>>> 
>>>>>>> Same for the next list, but this one, when it gets too long, you
>>>>>>> arbitrarily discard items from it.
>>>>>>> 
>>>>>>> Now the other thing that's happening, here, is that J doesn't really
>>>>>>> do all that well with incremental update algorithms. You'll still be
>>>>>>> slow if you do things "one group of three" at a time. But -
>>>>>>> hypothetically at least - you don't have to. The ideal would be to
>>>>>>> come up with an algorithm for a block of text which complies (at least
>>>>>>> in terms of data structures) with the rfc and which behaves reasonably
>>>>>>> close to the behavior of the rfc described behavior.
>>>>>>> 
>>>>>>> But having a slow implementation that works is a really important
>>>>>>> step. That (with some test cases - like the one you described here)
>>>>>>> gives you a baseline to test against.
>>>>>>> 
>>>>>>> Hopefully this helps?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> --
>>>>>>> Raul
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Sep 17, 2014 at 5:23 AM, bill lam <[email protected]> wrote:
>>>>>>>> The LZ77 variation used in deflate is at the bottom (copied from rfc
>>>>>>>> 1951).  I tried to translate it into J using a hash function which
>>>>>>>> interprets XYZ as the lower 3 bytes of a 4 byte integer.  My J
>>>>>>>> implementation worked but efficient is rather poor.  I think the
>>>>>>>> inefficiency of my implementation came from the need to
>>>>>>>> 1. build a vector of pre-computed hash of size equal to sliding
>>>>>>>> window. (although the hash of all bytes had been pre-computed)
>>>>>>>> 2. find the last index of the hash of current XYZ in the vector.
>>>>>>>> (since the left argument changes, special code of the form m&i: does 
>>>>>>>> not apply)
>>>>>>>> 
>>>>>>>> If data are truly random, the chance of matching would be very small,
>>>>>>>> thus the time spent should roughly the sum of the above 1 & 2 for
>>>>>>>> every byte, however it took a very long time even for data size of
>>>>>>>> only about 500KB.  Any idea for a better J implementation.
>>>>>>>> 
>>>>>>>> NB. for deflate, minimum length for match is 3, max length is 285,
>>>>>>>> maximum distance is 32K.
>>>>>>>> 
>>>>>>>> eg. LZ77 compression of the input 500#'abc'
>>>>>>>> is
>>>>>>>> aa(285,1)(213,1)b(285,1)(214,1)c(285,1)(214,1)
>>>>>>>> 
>>>>>>>> inside each bracket is (length,distance) pair applied to the  history
>>>>>>>> buffer. Untested but should be something like this.
>>>>>>>> 
>>>>>>>> (text from rfc)
>>>>>>>> The compressor uses a chained hash table to find duplicated strings,
>>>>>>>>  using a hash function that operates on 3-byte sequences.  At any
>>>>>>>>  given point during compression, let XYZ be the next 3 input bytes to
>>>>>>>>  be examined (not necessarily all different, of course).  First, the
>>>>>>>>  compressor examines the hash chain for XYZ.  If the chain is empty,
>>>>>>>>  the compressor simply writes out X as a literal byte and advances one
>>>>>>>>  byte in the input.  If the hash chain is not empty, indicating that
>>>>>>>>  the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
>>>>>>>>  same hash function value) has occurred recently, the compressor
>>>>>>>>  compares all strings on the XYZ hash chain with the actual input data
>>>>>>>>  sequence starting at the current point, and selects the longest
>>>>>>>>  match.
>>>>>>>> 
>>>>>>>>  The compressor searches the hash chains starting with the most recent
>>>>>>>>  strings, to favor small distances and thus take advantage of the
>>>>>>>>  Huffman encoding.  The hash chains are singly linked. There are no
>>>>>>>>  deletions from the hash chains; the algorithm simply discards matches
>>>>>>>>  that are too old.  To avoid a worst-case situation, very long hash
>>>>>>>>  chains are arbitrarily truncated at a certain length, determined by a
>>>>>>>>  run-time parameter.
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>>> ----------------------------------------------------------------------
>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>> 
>>>>>> --
>>>>>> regards,
>>>>>> ====================================================
>>>>>> GPG key 1024D/4434BAB3 2008-08-24
>>>>>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>>>>>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>>>>>> ----------------------------------------------------------------------
>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> 
>>>> --
>>>> regards,
>>>> ====================================================
>>>> GPG key 1024D/4434BAB3 2008-08-24
>>>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>>>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> 
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] LZ77

Reply via email to