Re: [Jprogramming] LZ77

Raul Miller Thu, 18 Sep 2014 08:32:12 -0700

Pun intended?

http://www.americanbar.org/aba.html


Thanks,

-- 
Raul


On Thu, Sep 18, 2014 at 11:22 AM, robert therriault
<[email protected]> wrote:
> The best form of self defence in a bar fight is to have left the bar 10 
> minutes before the fight starts. I learned that in Tai Chi and I think 
> software patent law is a lot like a bar fight. ;-)
>
> cheers, bob
>
> On Sep 18, 2014, at 8:11 AM, Raul Miller <[email protected]> wrote:
>
>> It's unlikely that most of those patents have merit.
>>
>> First, there's prior art (the LZ77 patent, for example - which is now
>> expired). And second, patents do not cover variations which would have
>> been obvious to someone with ordinary skill in the art before the
>> filing date of the patent.
>>
>> In the context of a decompression routine written in J, anything
>> described by the LZ77 patent must be obvious in the context of patents
>> filed later. And, since J is a variant on APL, any approaches that
>> were obvious to APL programmers back in the day would also be
>> considered obvious (for example), but also anything published in prior
>> centuries could also be considered evidence for obviousness.
>>
>> Changing the width of a window from 3 to 4? Obvious. Many grade school
>> kids are capable of doing that. Using an array instead of a linked
>> list? Obvious - that's what APL programmers do. Using an APL primitive
>> rather than some elaborate system of code? Obvious.
>>
>> Anyways, if someone tries to sue us on patent grounds we have a lot of
>> options for how to respond. We could, for example, start a kickstarter
>> project to document sufficient prior art to refute the patent.
>> Probably the first step, though, would be to read the claims of the
>> asserted patents and see what we can find for prior art for any of the
>> relevant claims.
>>
>> Generally speaking the claims will either be so narrow that they do
>> not apply, or so broad that they do apply but also apply to prior art.
>>
>> And, generally speaking, the claims will be asserted by lawyers with
>> no background in coding and no understanding of what the claims mean.
>> They'll typically have "experts" who back them up, who are far less
>> competent - at least in the context of what's obvious to a J
>> programmer - than the typical J programmer.
>>
>> Honestly, the biggest risk is probably falling asleep while reading
>> the legal "threats".
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>>
>> On Thu, Sep 18, 2014 at 6:09 AM, bill lam <[email protected]> wrote:
>>> Thanks. This is not urgent because zlib.so should usually
>>> available.
>>>
>>> Also as stated in rfc, many variation of LZ77 were patented so
>>> it suggested implementors to follow the guidelines (3 bytes at a
>>> time, hash etc) which is patent free.  If the improvement in
>>> speed is signaficant then we may ignore its guideline.  I guess
>>> there would not be any revenue in sueing Jsoftware for damage.
>>>
>>> Чт, 18 сен 2014, Raul Miller написал(а):
>>>> I can try taking a look at the code, one of these days, if you want me to.
>>>>
>>>> (Not right this second though.)
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Raul
>>>>
>>>>
>>>> On Thu, Sep 18, 2014 at 5:45 AM, bill lam <[email protected]> wrote:
>>>>> Thanks Raul,
>>>>>
>>>>> I followed you suggestion to use 2 generations of hash vector,
>>>>> Its speed is faster now although still far behind zlib.so.
>>>>>
>>>>> Ср, 17 сен 2014, Raul Miller написал(а):
>>>>>> I would ignore the stuff about "hashes" and "hash chains" and "singly
>>>>>> linked". Hashes are nice sometimes but other times they're just a
>>>>>> disaster. Sometimes that's better than the alternative, but their
>>>>>> marketing is sometimes significantly better than their actual
>>>>>> behavior.
>>>>>>
>>>>>> Anyways, what's really happening here is an exact match search and
>>>>>> some kind of constraint on the size of the table.
>>>>>>
>>>>>> I think the best way to approach that in J is to search in sorted
>>>>>> lists. The problem here is the incremental update. You don't want to
>>>>>> re-sort a large list every time you add something to the list.
>>>>>>
>>>>>> The way to approach that, I think, is to have several generations of
>>>>>> lists, with each larger than the last, and each slower to be update
>>>>>> than the last.
>>>>>>
>>>>>> So, for example, have on list which is unsorted. Use i. to look things
>>>>>> up in it, and merge it into the next list when it gets too long (maybe
>>>>>> when it has 30 items in it).
>>>>>>
>>>>>> The next list gets sorted every time it gets stuff added to it. Use I.
>>>>>> to look things up in it. Merge it into the next list when it gets too
>>>>>> long (maybe when it has 1000 items in it).
>>>>>>
>>>>>> Same for the next list, but this one, when it gets too long, you
>>>>>> arbitrarily discard items from it.
>>>>>>
>>>>>> Now the other thing that's happening, here, is that J doesn't really
>>>>>> do all that well with incremental update algorithms. You'll still be
>>>>>> slow if you do things "one group of three" at a time. But -
>>>>>> hypothetically at least - you don't have to. The ideal would be to
>>>>>> come up with an algorithm for a block of text which complies (at least
>>>>>> in terms of data structures) with the rfc and which behaves reasonably
>>>>>> close to the behavior of the rfc described behavior.
>>>>>>
>>>>>> But having a slow implementation that works is a really important
>>>>>> step. That (with some test cases - like the one you described here)
>>>>>> gives you a baseline to test against.
>>>>>>
>>>>>> Hopefully this helps?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> Raul
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 17, 2014 at 5:23 AM, bill lam <[email protected]> wrote:
>>>>>>> The LZ77 variation used in deflate is at the bottom (copied from rfc
>>>>>>> 1951).  I tried to translate it into J using a hash function which
>>>>>>> interprets XYZ as the lower 3 bytes of a 4 byte integer.  My J
>>>>>>> implementation worked but efficient is rather poor.  I think the
>>>>>>> inefficiency of my implementation came from the need to
>>>>>>> 1. build a vector of pre-computed hash of size equal to sliding
>>>>>>> window. (although the hash of all bytes had been pre-computed)
>>>>>>> 2. find the last index of the hash of current XYZ in the vector.
>>>>>>> (since the left argument changes, special code of the form m&i: does 
>>>>>>> not apply)
>>>>>>>
>>>>>>> If data are truly random, the chance of matching would be very small,
>>>>>>> thus the time spent should roughly the sum of the above 1 & 2 for
>>>>>>> every byte, however it took a very long time even for data size of
>>>>>>> only about 500KB.  Any idea for a better J implementation.
>>>>>>>
>>>>>>> NB. for deflate, minimum length for match is 3, max length is 285,
>>>>>>> maximum distance is 32K.
>>>>>>>
>>>>>>> eg. LZ77 compression of the input 500#'abc'
>>>>>>> is
>>>>>>> aa(285,1)(213,1)b(285,1)(214,1)c(285,1)(214,1)
>>>>>>>
>>>>>>> inside each bracket is (length,distance) pair applied to the  history
>>>>>>> buffer. Untested but should be something like this.
>>>>>>>
>>>>>>> (text from rfc)
>>>>>>> The compressor uses a chained hash table to find duplicated strings,
>>>>>>>   using a hash function that operates on 3-byte sequences.  At any
>>>>>>>   given point during compression, let XYZ be the next 3 input bytes to
>>>>>>>   be examined (not necessarily all different, of course).  First, the
>>>>>>>   compressor examines the hash chain for XYZ.  If the chain is empty,
>>>>>>>   the compressor simply writes out X as a literal byte and advances one
>>>>>>>   byte in the input.  If the hash chain is not empty, indicating that
>>>>>>>   the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
>>>>>>>   same hash function value) has occurred recently, the compressor
>>>>>>>   compares all strings on the XYZ hash chain with the actual input data
>>>>>>>   sequence starting at the current point, and selects the longest
>>>>>>>   match.
>>>>>>>
>>>>>>>   The compressor searches the hash chains starting with the most recent
>>>>>>>   strings, to favor small distances and thus take advantage of the
>>>>>>>   Huffman encoding.  The hash chains are singly linked. There are no
>>>>>>>   deletions from the hash chains; the algorithm simply discards matches
>>>>>>>   that are too old.  To avoid a worst-case situation, very long hash
>>>>>>>   chains are arbitrarily truncated at a certain length, determined by a
>>>>>>>   run-time parameter.
>>>>>>> ----------------------------------------------------------------------
>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>> ----------------------------------------------------------------------
>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>
>>>>> --
>>>>> regards,
>>>>> ====================================================
>>>>> GPG key 1024D/4434BAB3 2008-08-24
>>>>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>>>>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>> --
>>> regards,
>>> ====================================================
>>> GPG key 1024D/4434BAB3 2008-08-24
>>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] LZ77

Reply via email to