I'll answer you point by point; others who find this tedious can just scroll down to "conclusions" at the bottom.
On 4/27/07, Mark Waser <[EMAIL PROTECTED]> wrote:
I am NOT suggesting a rule-based system at this level. First I figure out
a good representation for the minimal Basic English grammar that fundamentally has the simplest grammatical rules embedded into it's structure rather than expressed as rules (i.e. I *AM* hand-crafting the design of the initial structure -- nouns, verbs, adjectives, adverbs, prepositions, noun clauses, verb clauses, prepositional phrases, simple SV sentences, simple SVO sentences, etc. I am also setting up different types of inheritance and analogy links between words/terms/structures). But I am definitely not going to be hand-coding grammar "rules" per se. After the fact, you obviously could say that certain rules define the structure (and you could obviously design an analogous rule-based system -- with *a lot* more effort) but, at the lowest level, my version of the system is not going to be run as a rule-based system. So you are indeed trying to hand-code certain aspects of English grammar into your KR structure. But clearly you cannot embed *full* English grammar into your KR. So what about the *rest* of the grammar knowledge? You must learn it via machine learning. Then how do you call this "machine-learned stuff"? I call it "grammar rules", but you may call it "declarative linguistic knowledge" or whatever. It's the same. Can you explain more about "inheritance and analogy links"? It seems that you're trying to represent grammar structures using these "links". I seems that they are just computationally equivalent to production rules.
You're also missing the fact that language is both grammar and
vocabulary. The system needs to be able to "understand" from the beginning (Understand, in this case, meaning being able to translate anything to "seed-only" form -- and thus, to be able, via the built in structure, to know how to transform between alternative forms and to know what aspects and data attach where). I agree and I am well aware of this.
3. You then need to add more complex grammar rules to handle "real"
English, but such rules are difficult to hand-craft and thus may probably require machine learning.
At this point, the tools I mentioned come into play to extend the
structure until it can handle "real" English. This extension is clearly in the realm of machine learning but, I believe, is structured and limited enough to be feasible -- particularly if I start by pointing it at "well-behaved/well-defined" sources like dictionaries, encyclopedias, etc. Yes it is 100% feasible, but it requires complex machine learning. All I'm saying is that I prefer a route where we collect commonsense knowledge with Basic English *before* we go into the realm of full English.
4. Only at this stage you can digest the web or newspapers. Actually, it can probably start *attempting* to digest such sources fairly
early. All it really has to do is to be able to tell when it's pretty sure that it's correct and when it needs to try again after it's learned more (and discard the data until then). In particular though, it can always go to a dictionary when it runs across a new word (or when it seems that a known word has another, unknown definition) and it can also go to a trusted human if it's really not sure about how to parse a sentence (at which point it gives that human a list of alternatives to choose from -- which doesn't require extensive training to handle). The vocab is only part of the problem. The problem that I'm pointing out is how do you handle full English grammar. If your system don't have *comprehensive* grammatical knowledge, it will have problems *interpreting* complex or "irregular" English sentences, to the point that the "facts" you acquire would be incorrect.
>> I guess (3) and (4) won't happen immediately. And after (2) we can
start collecting commonsense facts via Basic English. So it seems to me that a viable "first product" could be a commonsense engine using Basic English, without going to 3 & 4.
I disagree. Building the extension/learning tools is a fundamental part
of the initial design. If you start collecting commonsense facts without a good data structure and the "understanding" described above, you end up with Cyc (the same facts encoded multiple ways, almost all facts inaccessible unless you access them in almost *exactly* the form in which they were encoded, etc., etc.). I don't find *any* value in that at all. Facts are only useful if you can access and use them. I see your point, but you misunderstand my approach. My KR scheme is minimalistic. Grammar is represented as additional rules (whereas you embed some linguistic structure into your KR). I plan to hand-code enough grammar for Basic English first, and the system will later use machine learning to learn more complex grammar rules, until it knows full English. You assume that my system don't have true "understanding" because it cannot inter-relate the same idea expressed in different sentences (ie, paraphrasing). That's a false accusation because my system can do this by logical reasoning. Conclusion: 1. The only difference between your approach and mine is that you hand-code *some* linguistic knowledge directly into your KR while I leave that to grammar rules. 2. You have not answered the question how you can easily digest web/newspaper without adult-level grammar knowledge, which is not easy to acquire/learn. 3. The issue really is which route is easier: (A) build the system to the point of understanding Basic English, then start collecting commensense facts. (B) build the system to understand real English, and crawl the web/newspapers. Don't forget that (A) can eventually grow into a full AGI; the issue here is the learning pathway. (B) is harder because it requires adult-level grammar AND a lot of commonsense knowledge. If you do not follow an INCREMENTAL LEARNING PATHWAY then your learning speed will be unfeasibly slow - that's my theory. YKY ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936