On 11/2/07, Matt Mahoney <[EMAIL PROTECTED]> wrote: > Well, one alternative is to deduce that aluminum is a mass noun by the low > frequency of phrases like "an aluminum is" from a large corpus of text (or > count Google hits). You could also deduce that aluminum is an adjective from > phrases like "an aluminum chair", etc. More generally, you would cluster > words in the high dimensional vector space of their immediate context, then > derive rules for moving from cluster to cluster. > > However, the fact that this method is not used in the best language models > suggests it may exceed the computational limits of your PC. This might > explain why we keep wading into the swamp.
It is doubtful this kind of examination of information can be 'conversational language' on PC computation for a while. What do you think about the feasibility of a research request using this method? ex: Find interesting information about: aluminum - to which the program builds a structure of information that it can continue refining and expanding until I return to check on it several hours later. If I think it's on the right track for my definition of interesting, I could let it continue researching for days. At the end of several days work, it would have a body of 'knowledge' that represents a cost to compile which makes it a local authority on this subject. Assuming someone else might request information about the same topic, my local knowledge store could be included in preliminary findings. Clearly a distributed network of nodes is never going to be capable of the brute-force speed of knowing all things in one place. I don't usually seek to know all things at once, just a useful number of things about a limited topic. That might good enough to make the effort worthwhile. ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=60638592-961890