Re: [Jprogramming] LZ77

Raul Miller Thu, 18 Sep 2014 08:13:03 -0700

It's unlikely that most of those patents have merit.

First, there's prior art (the LZ77 patent, for example - which is now
expired). And second, patents do not cover variations which would have
been obvious to someone with ordinary skill in the art before the
filing date of the patent.


In the context of a decompression routine written in J, anything
described by the LZ77 patent must be obvious in the context of patents
filed later. And, since J is a variant on APL, any approaches that
were obvious to APL programmers back in the day would also be
considered obvious (for example), but also anything published in prior
centuries could also be considered evidence for obviousness.

Changing the width of a window from 3 to 4? Obvious. Many grade school
kids are capable of doing that. Using an array instead of a linked
list? Obvious - that's what APL programmers do. Using an APL primitive
rather than some elaborate system of code? Obvious.

Anyways, if someone tries to sue us on patent grounds we have a lot of
options for how to respond. We could, for example, start a kickstarter
project to document sufficient prior art to refute the patent.
Probably the first step, though, would be to read the claims of the
asserted patents and see what we can find for prior art for any of the
relevant claims.

Generally speaking the claims will either be so narrow that they do
not apply, or so broad that they do apply but also apply to prior art.

And, generally speaking, the claims will be asserted by lawyers with
no background in coding and no understanding of what the claims mean.
They'll typically have "experts" who back them up, who are far less
competent - at least in the context of what's obvious to a J
programmer - than the typical J programmer.

Honestly, the biggest risk is probably falling asleep while reading
the legal "threats".

Thanks,

-- 
Raul



On Thu, Sep 18, 2014 at 6:09 AM, bill lam <[email protected]> wrote:
> Thanks. This is not urgent because zlib.so should usually
> available.
>
> Also as stated in rfc, many variation of LZ77 were patented so
> it suggested implementors to follow the guidelines (3 bytes at a
> time, hash etc) which is patent free.  If the improvement in
> speed is signaficant then we may ignore its guideline.  I guess
> there would not be any revenue in sueing Jsoftware for damage.
>
> Чт, 18 сен 2014, Raul Miller написал(а):
>> I can try taking a look at the code, one of these days, if you want me to.
>>
>> (Not right this second though.)
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>> On Thu, Sep 18, 2014 at 5:45 AM, bill lam <[email protected]> wrote:
>> > Thanks Raul,
>> >
>> > I followed you suggestion to use 2 generations of hash vector,
>> > Its speed is faster now although still far behind zlib.so.
>> >
>> > Ср, 17 сен 2014, Raul Miller написал(а):
>> >> I would ignore the stuff about "hashes" and "hash chains" and "singly
>> >> linked". Hashes are nice sometimes but other times they're just a
>> >> disaster. Sometimes that's better than the alternative, but their
>> >> marketing is sometimes significantly better than their actual
>> >> behavior.
>> >>
>> >> Anyways, what's really happening here is an exact match search and
>> >> some kind of constraint on the size of the table.
>> >>
>> >> I think the best way to approach that in J is to search in sorted
>> >> lists. The problem here is the incremental update. You don't want to
>> >> re-sort a large list every time you add something to the list.
>> >>
>> >> The way to approach that, I think, is to have several generations of
>> >> lists, with each larger than the last, and each slower to be update
>> >> than the last.
>> >>
>> >> So, for example, have on list which is unsorted. Use i. to look things
>> >> up in it, and merge it into the next list when it gets too long (maybe
>> >> when it has 30 items in it).
>> >>
>> >> The next list gets sorted every time it gets stuff added to it. Use I.
>> >> to look things up in it. Merge it into the next list when it gets too
>> >> long (maybe when it has 1000 items in it).
>> >>
>> >> Same for the next list, but this one, when it gets too long, you
>> >> arbitrarily discard items from it.
>> >>
>> >> Now the other thing that's happening, here, is that J doesn't really
>> >> do all that well with incremental update algorithms. You'll still be
>> >> slow if you do things "one group of three" at a time. But -
>> >> hypothetically at least - you don't have to. The ideal would be to
>> >> come up with an algorithm for a block of text which complies (at least
>> >> in terms of data structures) with the rfc and which behaves reasonably
>> >> close to the behavior of the rfc described behavior.
>> >>
>> >> But having a slow implementation that works is a really important
>> >> step. That (with some test cases - like the one you described here)
>> >> gives you a baseline to test against.
>> >>
>> >> Hopefully this helps?
>> >>
>> >> Thanks,
>> >>
>> >> --
>> >> Raul
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Sep 17, 2014 at 5:23 AM, bill lam <[email protected]> wrote:
>> >> > The LZ77 variation used in deflate is at the bottom (copied from rfc
>> >> > 1951).  I tried to translate it into J using a hash function which
>> >> > interprets XYZ as the lower 3 bytes of a 4 byte integer.  My J
>> >> > implementation worked but efficient is rather poor.  I think the
>> >> > inefficiency of my implementation came from the need to
>> >> > 1. build a vector of pre-computed hash of size equal to sliding
>> >> > window. (although the hash of all bytes had been pre-computed)
>> >> > 2. find the last index of the hash of current XYZ in the vector.
>> >> > (since the left argument changes, special code of the form m&i: does 
>> >> > not apply)
>> >> >
>> >> > If data are truly random, the chance of matching would be very small,
>> >> > thus the time spent should roughly the sum of the above 1 & 2 for
>> >> > every byte, however it took a very long time even for data size of
>> >> > only about 500KB.  Any idea for a better J implementation.
>> >> >
>> >> > NB. for deflate, minimum length for match is 3, max length is 285,
>> >> > maximum distance is 32K.
>> >> >
>> >> > eg. LZ77 compression of the input 500#'abc'
>> >> > is
>> >> > aa(285,1)(213,1)b(285,1)(214,1)c(285,1)(214,1)
>> >> >
>> >> > inside each bracket is (length,distance) pair applied to the  history
>> >> > buffer. Untested but should be something like this.
>> >> >
>> >> > (text from rfc)
>> >> > The compressor uses a chained hash table to find duplicated strings,
>> >> >    using a hash function that operates on 3-byte sequences.  At any
>> >> >    given point during compression, let XYZ be the next 3 input bytes to
>> >> >    be examined (not necessarily all different, of course).  First, the
>> >> >    compressor examines the hash chain for XYZ.  If the chain is empty,
>> >> >    the compressor simply writes out X as a literal byte and advances one
>> >> >    byte in the input.  If the hash chain is not empty, indicating that
>> >> >    the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
>> >> >    same hash function value) has occurred recently, the compressor
>> >> >    compares all strings on the XYZ hash chain with the actual input data
>> >> >    sequence starting at the current point, and selects the longest
>> >> >    match.
>> >> >
>> >> >    The compressor searches the hash chains starting with the most recent
>> >> >    strings, to favor small distances and thus take advantage of the
>> >> >    Huffman encoding.  The hash chains are singly linked. There are no
>> >> >    deletions from the hash chains; the algorithm simply discards matches
>> >> >    that are too old.  To avoid a worst-case situation, very long hash
>> >> >    chains are arbitrarily truncated at a certain length, determined by a
>> >> >    run-time parameter.
>> >> > ----------------------------------------------------------------------
>> >> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> > --
>> > regards,
>> > ====================================================
>> > GPG key 1024D/4434BAB3 2008-08-24
>> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] LZ77

Reply via email to