Re: [agi] Language modeling
- Original Message From: Richard Loosemore [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, October 24, 2006 12:37:16 PM Subject: Re: [agi] Language modeling Matt Mahoney wrote: Converting natural language to a formal representation requires language modeling at the highest level. The levels from lowest to highest are: phonemes, word segmentation rules, semantics, simple sentences, compound sentences. Regardless of whether your child learned to read at age 3 or not at all, children always learn language in this order. And the evidence for this would be what? Um, any textbook on psycholinguistics or developmental psychology, also the paper by Jusczyk I cited earlier. Ben pointed me to a book by Tomasello which I haven't read, but here is a good summary of his work on language acquisition in children. http://email.eva.mpg.de/~tomas/pdf/Mussen_chap_proofs.pdf I realize that the stages of language learning overlap, but they do not all start at the same time. It is a simple fact that children learn words with semantic content like ball or milk before they learn function words like the or of, in spite of the higher frequency of the latter. Likewise, successful language models used for information retrieval ignore function words and word order. Furthermore, children learn word segmentation rules before they learn words, again consistent with statistical language models. (The fact that children can learn sign language at 6 months is not inconsistent with these models. Sign language does not have the word segmentation problem). We can learn from these observations. One conclusion that I draw is that you can't build an AGI and tack on language modeling later. You have to integrate language modeling and train it in parallel with nonverbal skills such as vision and motor control, similar to training a child. We don't know today whether this will turn out to be true. Another important question is: how much will this cost? How much CPU, memory, and training data do you need? Again we can use cognitive models to help answer these questions. According to Tomasello, children are exposed to about 5000 to 7000 utterances per day, or about 20,000 words. This is equivalent to about 100 MB of text in 3 years. Children learn to use simple sentences of the form (subject-verb-object) and recognize word order in these sentences at about 22-24 months. For example, they respond correctly to make the bunny push the horse. However, such models are word specific. At about age 3 1/2, children are able to generalize novel words used in context as a verb to other syntactic constructs, e.g. to construct transitive sentences given examples where the verb is used only intransitively. This is about the state of the art with statistical models trained on hundreds of megabytes of text. Such experiments suggest that adult level modeling, which will be needed to interface with structured knowledge bases, will require about a gigabyte of training data. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
Hi. The state of the art in language modeling is at the level of simple sentences, modeling syntax using n-grams (usually trigrams) or hidden Markov models ... Just a remark: google recently made their up-to-5-grams available through LDC http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html - lk - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
On 10/22/06, Matt Mahoney [EMAIL PROTECTED] wrote: Also to Novamente, if I understand correctly. Terms are linked by a probability and confidence. This seems to me to be an optimization of a neural network or connectionist model, which is restricted to one number per link, representing probability. I'm afraid the difference of these two types of system is too large to be compared in this way. In general, the weight of a link in NN is not probability. To model confidence you would have to make redundant copies of the input and output units and their connections. This would be inefficient, of course. I guess we use the word confidence differently. For what I mean, see http://pages.google.com/edit/NARS.Wang/wang.confidence.pdf One aspect of NARS and many other structured or semi-structured knowledge representations that concerns me is the direct representation of concepts such as is-a, equivalence, logic (if-then, and, or, not), quantifiers (all, some), time (before and after), etc. These things seem fundamental to knowledge but are very hard to represent in a neural network, so it seems expedient to add them directly. My concern is that the direct encoding of such knowledge greatly complicates attempts to use natural language, which is still an unsolved problem. Language is the only aspect of intelligence that separates humans from other animals. Without language, you do not have AGI (IMHO). I agree that the distinction between innate knowledge and acquired knowledge is a major design decision. However, I believe it is necessary to make the notions you mentioned innate, though in different forms as how they are usually handled in symbolic AI. My concern is that structured knowledge is inconsistent with the development of language in children. First, I'm not so sure about the above conclusion. For example, to me, is-a (which is called inheritance in NARS) is nothing but the relation between special patterns and general patterns, which needs to be there for many types of learning to happen. Second, if it is indeed the case in children, it still doesn't mean that AGI must be developed in the same way. If these notions can be easily developed from more basic ones, we can make them learned. However, it is not the case so far. As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation. Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation. Working in this fashion, we can represent arbitrarily complex concepts. It depends on your model of concept. For mine, the NN mechanism is not enough to learn a concept. See http://nars.wang.googlepages.com/wang.categorization.pdf Children also learn language as a progression toward increasingly complex patterns. Sure, I have no problem about that. - phonemes beginning at 2-4 weeks - phonological rules for segmenting continuous speech at 7-10 months [1] - words (semantics) beginning at 12 months - simple sentences (syntax) at 2-3 years - compound sentences around 5-6 years Since I don't think AGI should accurately duplicate human intelligence, I make no attempt to follow the same process. Attempts to change the modeling order are generally unsuccessful. It depends. For example, of course an AGI also needs to learn simple sentences before compound sentences, but I don't think it is necessary for it to start at phonemes. For example, attempting to parse a sentence first and then extract its meaning does not work. You cannot parse a sentence without semantics. For example, the correct parse of I ate pizza with NP depends on whether NP is pepperoni, a fork, or Sam. Fully agree. See http://nars.wang.googlepages.com/wang.roadmap.pdf , Section 3(2). Now when we hard code knowledge about logic, quantifiers, time, and other concepts and then try to retrofit NLP to it, we are modeling language in the worst possible order. Such concepts, needed to form compound sentences, are learned at the last stage of language deveopment. In fact, some tribal languages such as Piraha [2] do not ever reach this stage, even for adults. It depends on what you mean by logic and so on. Of course things like propositional logic and predicate logic are not innate, but learned at a very late age. However, I believe there is an innate logic, a general-purpose reasoning-learning mechanism, which must be coded in the initial structure of the system. See http://nars.wang.googlepages.com/wang.roadmap.pdf , Section 4(3). I don't think anyone is arguing that learning can come from nowhere. The difference is in what should be included in this innate logic. For example, I argued that inheritance should be included in it in my book, Section 10.2 (sorry, no on-line material). My caution is that any
Re: [agi] Language modeling
Hi Matt,Regarding logic-based knowledge representation and language/perceptual/action learning -- I understand the nature of your confusion, because the point you are confused on is exactly the biggest point of confusion for new members of the Novamente AI team. A very careful distinction needs to be drawn between:1) the distinction between1a) using probabilistic and formal-logical operators for representing knowledge1b) using neural-net type operators (or other purely quantitative, non-logic-related operators) for representing knowledge 2) the distinction between2a) using ungrounded formal symbols to pretend to represent knowledge, e.g. an explicit labeled internal symbol for cat, one for give, etc.2b) having an AI system recognize patterns in its perception and action experience, and build up its own concepts (including symbolic ones) via learning; which means that concepts like cat and give will generally be represented as complex, distributed structures in the knowledge base, not as individual tokens From the history of mainstream AI, one might conclude that 1a and 2a inevitably cluster together, so that the only hope for 2b lies in 1b. However, this is not the case. Novamente combines 1a and 2b, and I believe NARS is intended to also My contention is that probabilistic logic can be a suitable knowledge representation for raw perceptions and actions, and that logical inference (combined with pattern mining, evolutionary learning and other cognitive operations) can be used to build up abstract concepts grounded in perception and actions, where these concepts are experientially-learned and richly complex and yet expressed internally in probabilistic logic formalism. For instance, this means that the cat concept may well not be expressed by a single cat term, but perhaps by a complex learned (probabilistic) logical predicate. How this relates to what the human mind does is a whole other question. I have my hypotheses about the mapping between the human brain's KR and probabilistic logic. But my point for now is simply that all logic-based systems should not be damned based on the fact that historically a bunch of famous AI researchers have used logic-based KR in a cognitively unworkable way. Probabilistic logic is a general formalism that can express anything, and furthermore it can express any thing in a whole lot of different ways. The trick in using it properly for AGI is to integrate it fully with an experiential learning system. Pei's distinction between experiential semantics and model-theoretic semantics is also of interest here. -- Ben GOn 10/22/06, Matt Mahoney [EMAIL PROTECTED] wrote: - Original Message From: Pei Wang [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, October 21, 2006 7:03:39 PM Subject: Re: [agi] SOTAWell, in that sense NARS also has some resemblance to a neuralnetwork, as well as many other AI systems.Also to Novamente, if I understand correctly.Terms are linked by a probability and confidence.This seems to me to be an optimization of a neural network or connectionist model, which is restricted to one number per link, representing probability.To model confidence you would have to make redundant copies of the input and output units and their connections.This would be inefficient, of course. One aspect of NARS and many other structured or semi-structured knowledge representations that concerns me is the direct representation of concepts such as is-a, equivalence, logic (if-then, and, or, not), quantifiers (all, some), time (before and after), etc.These things seem fundamental to knowledge but are very hard to represent in a neural network, so it seems expedient to add them directly.My concern is that the direct encoding of such knowledge greatly complicates attempts to use natural language, which is still an unsolved problem.Language is the only aspect of intelligence that separates humans from other animals.Without language, you do not have AGI (IMHO). My concern is that structured knowledge is inconsistent with the development of language in children.As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation.Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation.Working in this fashion, we can represent arbitrarily complex concepts.In a connectionist model, we have, for example: - pixels- line segments- letters- words- phrases, parts of speech- sentencesetc.Children also learn language as a progression toward increasingly complex patterns. - phonemes beginning at 2-4 weeks- phonological rules for segmenting continuous speech at 7-10 months [1]- words (semantics) beginning at 12 months- simple sentences (syntax) at 2-3 years - compound sentences around 5-6 yearsAttempts to change the modeling order are generally
Re: [agi] Language modeling
Matt Mahoney wrote: My concern is that structured knowledge is inconsistent with the development of language in children. As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation. Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation. Working in this fashion, we can represent arbitrarily complex concepts. In a connectionist model, we have, for example: Pei has already addressed some of the other problems with what you have said, so I will confine my comments to this part. Perceptron learning is known (since four decades ago) to have limitations that make it a ludicrous choice for learning. See Minsky and Papert (the book Perceptrons). And in the sequence of items that follow, some things can be done with conventional NNs like backprop, but others like phrases and sentences are completely impossible unless you add something to them. Your comments that attempting to parse a sentence first and then extract its meaning does not work is naive. Humans clearly do partial parsing simultaneously with semantic decoding (there is a *huge* literature on this, but for one choice example, see Frazier, L., Clifton, C., Randall, J. (1983). Filling Gaps: Decision principles and structure in sentence comprehension. Cognition, 13, 187-222.). [snip] Children also learn language as a progression toward increasingly complex patterns. - phonemes beginning at 2-4 weeks - phonological rules for segmenting continuous speech at 7-10 months [1] - words (semantics) beginning at 12 months - simple sentences (syntax) at 2-3 years - compound sentences around 5-6 years ARR! Please don't do this. My son (like many other kids) had finished about fifty small books by the time he was 5, and at least one of the Harry Potter books when he was 6. You are talking about these issues at a pre-undergraduate level of comprehension. Richard Loosemore. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
On 23 Oct 2006 at 10:06, Ben Goertzel wrote: A very careful distinction needs to be drawn between: 1) the distinction between 1a) using probabilistic and formal-logical operators for representing knowledge 1b) using neural-net type operators (or other purely quantitative, non- logic-related operators) for representing knowledge 2) the distinction between 2a) using ungrounded formal symbols to pretend to represent knowledge, e.g. an explicit labeled internal symbol for cat, one for give, etc. 2b) having an AI system recognize patterns in its perception and action experience, and build up its own concepts (including symbolic ones) via learning; which means that concepts like cat and give will generally be represented as complex, distributed structures in the knowledge base, not as individual tokens From the history of mainstream AI, one might conclude that 1a and 2a inevitably cluster together, so that the only hope for 2b lies in 1b. However, this is not the case. Novamente combines 1a and 2b, and I believe NARS is intended to also I agree that combining probabilistic logic (with a reasonable amount of consistency enforcement) with 'bottom-up' learning is crucial. However I would suggest that '2a' is often worthwhile as a soft, context-dependent index into '2b', particularly as a inference tool when you can do a lossy simplification to symbolic logic, do some fast inference on that, then pop back into managed-consistency-scope probabilistic logic with some conclusions that are conditional on the estimated probability that the assumptions behind the simplification hold. Most human usage of scientific theories and engineering rules looks like this. Some applications (e.g. FAI self-modification) demand complete rigour and more complicated techniques to get it, but those are relatively rare. Similarly it's ok to use embedded chunks of 1b when their inputs and outputs are tightly scoped and you know what they're doing. Though the kind of connectionist/informal learning algorithms I'd advocate using (fully-custom, non-general and tightly-integrated algorithms generated by the AI using an optimisation pressure model) don't look much like (in my experience to date) the currently popular plausible-seeming-to-humans algorithms. My contention is that probabilistic logic can be a suitable knowledge representation for raw perceptions and actions, and that logical inference (combined with pattern mining, evolutionary learning and other cognitive operations) can be used to build up abstract concepts grounded in perception and actions, Agree, with the proviso that my idea of 'adequate grounding' is different from yours (I'd characterise mine as 'explicit grounding' and yours as 'implicit grounding'). For instance, this means that the cat concept may well not be expressed by a single cat term, but perhaps by a complex learned (probabilistic) logical predicate. I don't think it's really useful to discuss representing word meanings without a sufficiently powerful notion of context (which is really hard). But my point for now is simply that all logic-based systems should not be damned based on the fact that historically a bunch of famous AI researchers have used logic-based KR in a cognitively unworkable way. I certainly agree with that, as long as 'logic-based' means 'probabilistic logic with bottom-up modelling and no unitary concepts or simple word-symbol mappings'. Unfortunately many people would read 'logic based' as 'looks like Cyc'. Probabilistic logic is a general formalism that can express anything, and furthermore it can express any thing in a whole lot of different ways. That isn't a point in its favour. Expressive scope allows people to say 'oh, our system could do that, it just needs the right rules/network/whatever' whenever you ask them 'so how would your system implement cognitive ability X?'. The limited expressive scope of classic ANNs was actually essential for getting relatively naïve and simplistic learning algorithms (e.g. backprop, Hebbian learning) to produce useful solutions to an interesting (if still fairly narrow) class of problems. OTOH, if you disallow 'please wait for me to program that into the KR' and 'we just need a bigger computer!' excuses, using a very expressive substrate (at the limit, TM-equivalent code) actually forces people to design powerful earning algorithms, so in that sense maybe it is a good thing. Michael Wilson Director of Research and Development Bitphase AI Ltd - http://www.bitphase.com - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
Hi, For instance, this means that the cat concept may well not be expressed by a single cat term, but perhaps by a complex learned (probabilistic) logical predicate.I don't think it's really useful to discuss representing word meaningswithout a sufficiently powerful notion of context (which is really hard). Agreed. Most meanings in Novamente are context-relative, in fact... But my point for now is simply that all logic-based systems should not be damned based on the fact that historically a bunch of famous AI researchers have used logic-based KR in a cognitively unworkable way. I certainly agree with that, as long as 'logic-based' means 'probabilisticlogic with bottom-up modelling and no unitary concepts or simpleword-symbol mappings'. Unfortunately many people would read 'logic based' as 'looks like Cyc'.Thanks -- you summarized one of my main points very effectively. Probabilistic logic is a general formalism that can express anything, and furthermore it can express any thing in a whole lot of different ways.That isn't a point in its favour. Expressive scope allows people to say 'oh, our system could do that, it just needs the rightrules/network/whatever' whenever you ask them 'so how would your systemimplement cognitive ability X?'. The limited expressive scope of classicANNs was actually essential for getting relatively naïve and simplistic learning algorithms (e.g. backprop, Hebbian learning) to produce usefulsolutions to an interesting (if still fairly narrow) class of problems.Well, recurrent NN's also have universal applicability, just like probabilistic logic systems. And, this means that any general endorsement or condemnation of logic-based OR NN-based methods is pretty silly These are just very general tools, which may be used in many different ways. Ben This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
On 10/23/06, Matt Mahoney [EMAIL PROTECTED] wrote: [...] One aspect of NARS and many other structured or semi-structured knowledge representations that concerns me is the direct representation of concepts such as is-a, equivalence, logic (if-then, and, or, not), quantifiers (all, some), time (before and after), etc.These things seem fundamental to knowledge but are very hard to represent in a neural network, so it seems expedient to add them directly.My concern is that the direct encoding of such knowledge greatly complicates attempts to use natural language, which is still an unsolved problem.Language is the only aspect of intelligence that separates humans from other animals.Without language, you do not have AGI (IMHO). My concern is that structured knowledge is inconsistent with the development of language in children.As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation.Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation.Working in this fashion, we can represent arbitrarily complex concepts.In a connectionist model, we have, for example: It is not obligatory that AGI designers should make their AGIs as ignorant as babies at the beginning. Why many people have this predilection is because they think that the AGI should be able to learn *anything*. Therefore: the point is whether our AGI can learn/express/reason withanything and everything; but it's not whether we should equip the AI with an advanced KR structure initially. Ichoose the latter as a short cut. I think it is a good thing if the KR has a good design. A belatedpoint is that classical logic usually cannot talk about syncategorimatic constructs. For example in predicate logic you cannot say AND is a very useful thingy.I think myterm-logic enhanced version can do this. =) YKY This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
Ben Goertzel wrote: The limited expressive scope of classic ANNs was actually essential for getting relatively naïve and simplistic learning algorithms (e.g. backprop, Hebbian learning) to produce useful solutions to an interesting (if still fairly narrow) class of problems. Well, recurrent NN's also have universal applicability, just like probabilistic logic systems. And not coincidentally designing learning algorithms that work well on recurrent networks is much harder than for non-recurrent ones. Though many of the more extreme ANN fans seem to be in denial of this (or that fine-grained recurrency is actually important). In general I am more in favour of designing powerful learning algorithms that work on rough fitness landscapes than I am of designing a substrate that flattens the apparent fitness landscape for relevant classes of problem. The former approach scales better, forces you to understand what you're doing better and is usually more compatible with reflection and a causally clean goal system. The latter approach is more compatible with the zero-foresight and incremental-dev-path restrictions of evolution, but humans shouldn't be hobbled by those. Michael Wilson Director of Research and Development Bitphase AI Ltd - http://www.bitphase.com - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
YKY,Of course there is no a priori difference betw a set of nodes and links and a set of logical relationships...The question with your DB of facts about love and so forth is whether it captures the subtler uncertain patterns regarding love that we learn via experience My strong suspicion is that the patterns (uncertain logical relationships) that are easily articulable in compact form by the conscious human mind (when building a DB), are only a small subset of the critical ones The subtler patterns that we acquire via experience and that exist in our unconscious are probably the vast majority of critical patterns for really understanding something like love... -- BenOn 10/23/06, YKY (Yan King Yin) [EMAIL PROTECTED] wrote: On 10/23/06, Ben Goertzel [EMAIL PROTECTED] wrote: 2) the distinction between 2a) using ungrounded formal symbols to pretend to represent knowledge, e.g. an explicit labeled internal symbol for cat, one for give, etc. 2b) having an AI system recognize patterns in its perception and action experience, and build up its own concepts (including symbolic ones) via learning; which means that concepts like cat and give will generally be represented as complex, distributed structures in the knowledge base, not as individual tokens I think in G0, symbols are grounded and they exist in complex relations with other symbols. What may be misleading is that you see I talk about a symbol like love or 3 in isolation and you think that is very not-AGI to do so. But I have a KB of facts about love, 3, etc, even augmented with probabilities. There is no real difference between this and your graphical representation. Any graph can be completely described by listing all its nodes and vertices. YKY This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED] This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
I don't exactly have the same reaction, but I have some things to add to the following exchange. On 10/23/06, Richard Loosemore [EMAIL PROTECTED] wrote: Matt Mahoney wrote: Children also learn language as a progression toward increasingly complex patterns. - phonemes beginning at 2-4 weeks - phonological rules for segmenting continuous speech at 7-10 months [1] - words (semantics) beginning at 12 months - simple sentences (syntax) at 2-3 years - compound sentences around 5-6 years ARR! Please don't do this. My son (like many other kids) had finished about fifty small books by the time he was 5, and at least one of the Harry Potter books when he was 6. You are talking about these issues at a pre-undergraduate level of comprehension. Anecdotal evidence is always bad, but I will note that I myself was reading Tolkein(badly) by 1st grade, and when I was five was scared badly by a cold war children's book Nobody wants a Nuclear War. There are also other problems with neat progressions like this. One glaring one is that much younger children can learn sign language(which is physically much easier) and communicate fairly complicated concepts far in advance of speech, so much so that many parent courses now suggest and support learning and teaching baby sign language so as to be able to communicate desires, needs, and explanations with the child much earlier. -- Justin Corwin [EMAIL PROTECTED] http://outlawpoet.blogspot.com http://www.adaptiveai.com - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
In child development understanding seems to considerably precede the ability to articulate that understanding. Also development seems to generally move from highly abstract representations (stick men, smily suns) to more concrete adult-like ones. On 23/10/06, justin corwin [EMAIL PROTECTED] wrote: Anecdotal evidence is always bad, but I will note that I myself wasreading Tolkein(badly) by 1st grade, and when I was five was scaredbadly by a cold war children's book Nobody wants a Nuclear War. This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
I am interested in identifying barriers to language modeling and how to overcome them. I have no doubt that probabilistic models such as NARS and Novamente can adequately represent human knowledge. Also, I have no doubt they can learn e.g. relations such as all frogs are green from examples of green frogs. My question relates to solving the language problem: how to convert natural language statements like frogs are green and equivalent variants into the formal internal representation without the need for humans to encode stuff like (for all X, frog(X) = green(X)). This problem is hard because there might not be terms that exactly correspond to frog or green, and also because interpreting natural language statements is not always straightforward, e.g. I know it was either a frog or a leaf because it was green. Converting natural language to a formal representation requires language modeling at the highest level. The levels from lowest to highest are: phonemes, word segmentation rules, semantics, simple sentences, compound sentences. Regardless of whether your child learned to read at age 3 or not at all, children always learn language in this order. The state of the art in language modeling is at the level of simple sentences, modeling syntax using n-grams (usually trigrams) or hidden Markov models generally without recursion (flat), and modeling semantics as word associations, possibly generalizing via LSA or clustering to exploit the transitive property (if A means B and B means C, then A means C). This is the level of modeling of the top text compressors on the large text benchmark and the lowest perplexity models used in speech recognition. I gave an example of a Google translation of English to Arabic and back. You may have noticed that strings of up to about 6 words looked grammatically correct, but that longer sequences contained errors. This is a characteristic of trigram models. Shannon noted in 1949 that random sequences that fit the n-gram (letter or word) statistics of English appear correct up to about 2n. All of these models have the property that they are trained in the same order that children learn language. For example, parsing sentences without semantics is difficult, but extracting semantics without parsing (text search) is easy. As a second example, it is possible to build a lexicon from text only if you know the rules for word segmentation. However, the reverse is not true. It is not necessary to have a lexicon to segment continuous text (spaces removed). The segmentation rules can be derived from n-gram statistics, analogous to learning the phonological rules for segmenting continuous speech. This was first demonstrated in text by Hutchens and Alder, which I improved on in 1999. http://cs.fit.edu/~mmahoney/dissertation/lex1.html With this observation, it seems that hard coding rules for inheritance, equivalence, logical, temporal etc. relations, into a knowledge representation will not help in learning these relations from text. The language model still has to learn these relations from previously learned, simpler concepts. In other words, the model has to learn the meanings of is, and, not, if-then, all, before, etc. without any help from the structure of the knowledge represenation or explicit encoding. The model has to first learn how to convert compound sentences into a formal representation and back, and only then can it start using or adding to the knowledge base. So my question is: what is needed to extend language models to the level of compound sentences? More training data? Different training data? A new theory of language acquisition? More hardware? How much? -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
Re: [agi] Language modeling
So my question is: what is needed to extend language models to the level of compound sentences? More training data? Different training data? A new theory of language acquisition? More hardware? How much? What is needed is: A better training approach, involving presentation of compound sentences in conjunction with real-world (or sim-world) situations ... A better theory of language acquisition, more fully explaining the impact of semantics and pragmatics on syntax learning. I like Tomassello's language acquisition theory BTW (see his book Constructing a Language), but connecting his ideas with pragmatic AI algorithms and structures is a lot of work (as I know for I have done it in the context of Novamente). Also Calvin and Bickerton, in Lingua ex Machina, have some interesting things to say, though they don't dig as deep as Tomassello -- Ben G - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]