YKY (Yan King Yin) <[EMAIL PROTECTED]> wrote:+ >Statements like "all X are Y" involve variables, which are like placeholders for other words / phrases. Are you sure your >neurally-based model can handle variables? I'm under the impression that conventional ANNs cannot handle variables.
Not easily. It is hard for humans to reason at this abstract level, and I expect it to be hard for a neural model that accurately models human cognition. But I think it is possible to learn substitutions such as "all X are Y" means "every X is Y" from many examples, e.g. "all birds have wings" means the same as "every bird has wings", then forming syntactic categegories for words that appear in the context "all X have", "have Y". Then for a novel sentence like "all plants are green" you learn the associations (plants ~ X) and (green ~ Y) can infer "every plant is green" by the substitutions (bird ~ X ~ plant) and (wings ~ Y ~ green). The abstract patterns X and Y are learned in the same manner as learning that novel words are nouns or verbs by their context. They need not be labeled to be learned. People don't need to know what a noun or verb is to form sentences. I think algebra is learned in a similar way. You can learn to make substitutions such as "3 + 5" for "5 + 3" or "X + Y" for "Y + X" by being given lots of examples. Just being told the rule is not enough to learn it. But this is at a fairly high level of abstraction. Not everybody learns algebra or logic. >I have thought of, and tried to implement, something like your idea; but I abandoned that approach in favor of a logic-based one >because logic is more expressive. I believe that encoding structural knowledge at the logical level is much more effective than >starting from the bottom-up without any structural knowledge. Logic is certainly more efficient, but I don't think you can solve the natural language interface problem that way. I did some rough calculations on the speed of the neural approach. I have estimated the size of a language model to be about 10^9 bits, equivalent to to 1 GB of text. (A couple of people posted that they have read 1000 books, a few hundred MB, so I think this is about right. I don't think anyone has read 10,000 books). To store 10^9 bits, a neural network needs 10^9 connections. To train this network on 1 GB takes about 10^9 training cycles. This is a total of 10^18 operations. Modern processors can compute about 10^10 multiply-add operations per second. (I have written neural network code (paq7asm.asm) in MMX assember, which is 8 times faster than optimized C code. You can multiply-accumulate 4 16-bit signed integers in one clock cycle). So training the network will take about 10^8 seconds, or 3 years on a PC. (This would still be an order of magnitude faster than humans learn language). I could speed this up by a factor of 100 by reducing the data set size to 10^8, but I don't think this could demonstrate learning abstract concepts such as algebra or logic. Children are exposed to about 20,000 words per day, so 10^8 would be equivalent to about age 3. This is about the current state of the art in statistical language modeling. I could also speed it up by running it in parallel on a network. The neural network would have to be densely connected to allow learning aribtrarily complex patterns. I estimate about 10^5 neurons with 10^4 connections each, allowing roughly 10^5 learned patterns or concepts in 10 layers, although it would not be a strictly layered feedforward architecture. In a distributed computation, the 10^5 neuron states would have to be broadcast to the other processors each training/test cycle. I think this could be done over an Ethernet with hundreds or thousands of PCs. But I don't have access to that kind of hardware. The other thing I can do is optimize the archtecture to reduce unneeded connections and unnecessary computation. For instance, since I am simulating an asynchronous network, some neurons would have slow response times and not need to be updated every cycle. Of course this requires lots of experimentation, and the experiments take a long time. -- Matt Mahoney, [EMAIL PROTECTED] ----- Original Message ---- From: YKY (Yan King Yin) <[EMAIL PROTECTED]> To: agi@v2.listbox.com Sent: Thursday, November 16, 2006 11:09:36 PM Subject: Re: [agi] One grammar parser URL On 11/17/06, Matt Mahoney <[EMAIL PROTECTED]> wrote: > Learning logic is similar to learning grammar. A statistical model can > classify words into syntactic categories by context, e.g. "the X is" tells > you that X is a noun, and that it can be used in novel contexts where other > nouns have been observed, like "a X was". At a somewhat higher level, you > can teach logical inference by giving examples such as: > > All men are mortal. Socrates is a man. Therefore Socrates is mortal. > All birds have wings. Tweety is a bird. Therefore Tweety has wings. > > which fit a pattern allowing you to complete the paragraph: > > All X are Y. Z is a X. Therefore... > > And likewise for other patterns that are taught in a logic class, e.g. "If X > then Y. Y is false. Therefore..." > > Finally you give examples in formal notation and their English equivalents, > "(X => Y) ^ ~Y", and again use statistical modeling to learn the > substitution rules to do these conversions. Statements like "all X are Y" involve variables, which are like placeholders for other words / phrases. Are you sure your neurally-based model can handle variables? I'm under the impression that conventional ANNs cannot handle variables. > To get to this point I think you will first need to train the language model > to detect higher level grammatical structures such as phrases and sentences, > not just word categories. > > I believe this can be done using a neural model. This has been attempted > using connectionist models, where neurons represent features at different > levels of abstraction, such as letters, words, parts of speech, phrases, and > sentence structures, in addition to time delayed copies of these. A problem > with connectionist models is that each word or concept is assigned to a > single neuron, so there is no biologically plausible mechanism for learning > new words. A more accurate model is one in which each concept is correleated > with many neurons to varying degrees, and each neuron is correlated with many > concepts. Then we have a mechanism, which is to shift a large number of > neurons slightly toward a new concept. Except for this process, we can still > use the connectionist model as an approximation to help us understand the > true model, with the understanding that a single weight in the model actually > represents a large number of connections. > > I believe the language learning algorithm is essentially Hebb's model of > classical conditioning, plus some stability constraints in the form of > lateral inhibition and fatigue. Right now this is still research. I have no > experimental results to show that this model would work. It is far from > developed. I hope to test it eventually by putting it into a text > compressor. If it does work, I don't know if it will train to a high enough > level to solve logical inference, at least not without some hand written > training data or a textbook on logic. If it does reach this point, we would > show that examples of correct inference compress smaller than incorrect > examples. To have it answer questions I would need to add a model of > discourse, but that is a long way off. Most training text is not > interactive, and I would need about 1 GB. > > Maybe you have some ideas? I have thought of, and tried to implement, something like your idea; but I abandoned that approach in favor of a logic-based one because logic is more expressive. I believe that encoding structural knowledge at the logical level is much more effective than starting from the bottom-up without any structural knowledge. Learning 10^9 bits --- but all bits are not created equal. In a hierarchically structured knowledgebase, the bits at the top of the hierarchy takes much more time to learn than the bits at the base levels. Therefore, encoding some structural knowledge at the mid-levels helps a great deal to shorten the required learning time. That's why I switched to the logic-based camp =) YY This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303