On Tue, Mar 19, 2013 at 6:26 PM, Steve Richfield
<[email protected]> wrote:
>> The test results show that converting words to ordinals is indeed an order 
>> of magnitude faster than working with strings. Of course this is without 
>> collision detection. To detect collisions, you still have to compare the 
>> original strings.
>
> NO!!!!!!
>
> Just compare the 64-bit hash, either as an integer or as a double precision 
> floating point value. Even in floating point, (integer is better here) the 
> chance of having ANY collisions, where different words have exactly the same 
> hash, is about one in a million. Even if there is a collision, it will only 
> make synonymous two unusual and unrelated words. It isn't perfect, but it IS 
> definitely good enough - the criteria I used for lots of this.

Yes, that's right, if rare errors are acceptable.

>> The poor distribution of the floating point hash could probably be fixed by 
>> multiplying by a larger number (it doesn't need to be prime) and using only 
>> the low bits.
>
> I don't understand. What is the problem multiplying by a really large number 
> in floating point?

There isn't. I just did it the way your patent application suggested
to show that the distribution is not uniform if you include the high
bits of the hash.

But I'm not sure what you are trying to patent. The floating point
hash probably has prior art so it is not patentable by itself. If it
is one step in a longer process, then anyone could work around it by
substituting an integer hash, which is technically superior anyway. If
your claim is not specific about the type of hash function, then I
don't see what you are patenting because nothing else is specified in
enough detail to be considered a disclosure. I understand the idea is
to search the web and do unspecified processing of the text in order
to mine personal data and send targeted ads, but people are already
doing that.

One problem I see in the application is a lack of references. Your
invention has to be novel and not obvious. The reason for citing
related work (such as other patents) is to show that at least you made
a feeble attempt to see if it's been done before.

> To cover our butts (since the entire patent system changed 4 days ago, and 
> the old rules are MUCH better than the new rules) there are LOTS of claims to 
> amend. As filed, it has 5 independent claims and 30 total claims.

My understanding is the rule was changed from "first to invent" to
"first to file", in order to be consistent with the rest of the world.
The U.S. still differs in that you have 1 year after public disclosure
to file. In the rest of the world, there is no such grace period.

--
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to