On Mon, Apr 1, 2013 at 5:35 AM, Steve Richfield
<[email protected]> wrote:
> Is Summly's algorithm described somewhere?

Not really. http://summly.com/technology.html

But the general idea is to remove the parts of the text that compress
easily, so you are left with low-frequency words and phrases from the
beginning of the document or that occur frequently in the document.
You can also use supervised learning to train the language model to
select the right features. How well all of this works depends on the
quality of the language model. Generally it would be an ensemble of
n-gram models, with parsing playing little or no role.
http://en.wikipedia.org/wiki/Automatic_summarization

> Note a quirk of law: It is conceivable that Summly had adopted my algorithm 
> but kept it proprietary. As such, Yahoo would have NO claim on the 
> technology, and their work would NOT count as prior art. It happens all the 
> time - people validly patent things that it turns out someone else has 
> already developed. These patents are fully enforceable.

Source code does not have much value unless you hire the developer. I
doubt that patents matters in this case, although I'm sure that Yahoo
will pursue them.

> However, my invention was NOT what to do, but how to do such things faster. 
> The combinatorial explosion from failed tests hangs over the head of all NL 
> "understanding" efforts. From what I can see, my method is the ONLY presently 
> known way of prospectively running fast enough, once the rules/tables/DB are 
> populated with all the information needed to process everyday English (or 
> other natural language).

Your bold claims would be more credible if you had an actual working
system. Populating a knowledge base with rules is not so easy as you
think. Ask Doug Lenat.

--
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to