On 11/2/07, Matt Mahoney <[EMAIL PROTECTED]> wrote:
> Well, one alternative is to deduce that aluminum is a mass noun by the low
> frequency of phrases like "an aluminum is" from a large corpus of text (or
> count Google hits).  You could also deduce that aluminum is an adjective from
> phrases like "an aluminum chair", etc.  More generally, you would cluster
> words in the high dimensional vector space of their immediate context, then
> derive rules for moving from cluster to cluster.
>
> However, the fact that this method is not used in the best language models
> suggests it may exceed the computational limits of your PC.  This might
> explain why we keep wading into the swamp.

It is doubtful this kind of examination of information can be
'conversational language' on PC computation for a while.  What do you
think about the feasibility of a research request using this method?
ex:  Find interesting information about: aluminum - to which the
program builds a structure of information that it can continue
refining and expanding until I return to check on it several hours
later.  If I think it's on the right track for my definition of
interesting, I could let it continue researching for days.  At the end
of several days work, it would have a body of 'knowledge' that
represents a cost to compile which makes it a local authority on this
subject.  Assuming someone else might request information about the
same topic, my local knowledge store could be included in preliminary
findings.

Clearly a distributed network of nodes is never going to be capable of
the brute-force speed of knowing all things in one place.  I don't
usually seek to know all things at once, just a useful number of
things about a limited topic. That might good enough to make the
effort worthwhile.

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=60638592-961890

Reply via email to