Re: [agi] One grammar parser URL

Matt Mahoney Fri, 17 Nov 2006 05:16:12 -0800

YKY (Yan King Yin) <[EMAIL PROTECTED]>
wrote:+

>Statements like "all X are Y" involve variables, which are like
placeholders for other words / phrases.  Are you sure your
>neurally-based model can handle variables?  I'm under the impression
that conventional ANNs cannot handle variables.

Not easily.  It is hard for humans to reason at this abstract level, and I 
expect it to be hard for a neural model that accurately models human cognition. 
 But I think it is possible to learn substitutions such as "all X are Y" means 
"every X is Y" from many examples, e.g. "all birds have wings" means the same 
as "every bird has wings", then forming syntactic categegories for words that 
appear in the context "all X have", "have Y".  Then for a novel sentence like 
"all plants are green" you learn the associations (plants ~ X) and (green ~ Y) 
can infer "every plant is green" by the substitutions (bird ~ X ~ plant) and 
(wings ~ Y ~ green).  The abstract patterns X and Y are learned in the same 
manner as learning that novel words are nouns or verbs by their context.  They 
need not be labeled to be learned.  People don't need to know what a noun or 
verb is to form sentences.

I think algebra is learned in a similar way.  You can learn to make 
substitutions such as "3 + 5" for "5 + 3" or "X + Y" for "Y + X" by being given 
lots of examples.  Just being told the rule is not enough to learn it.  But 
this is at a fairly high level of abstraction.  Not everybody learns algebra or 
logic.

>I have thought of, and tried to implement, something like your idea;
but I abandoned that approach in favor of a logic-based one >because
logic is more expressive.  I believe that encoding structural knowledge
at the logical level is much more effective than >starting from the
bottom-up without any structural knowledge.

Logic is certainly more efficient, but I don't think you can solve the natural 
language interface problem that way.  I did some rough calculations on the 
speed of the neural approach.  I have estimated the size of a language model to 
be about 10^9 bits, equivalent to to 1 GB of text.  (A couple of people posted 
that they have read 1000 books, a few hundred MB, so I think this is about 
right.  I don't think anyone has read 10,000 books).  To store 10^9 bits, a 
neural network needs 10^9 connections.  To train this network on 1 GB takes 
about 10^9 training cycles.  This is a total of 10^18 operations.  Modern 
processors can compute about 10^10 multiply-add operations per second.  (I have 
written neural network code (paq7asm.asm) in MMX assember, which is 8 times 
faster than optimized C code.  You can multiply-accumulate 4 16-bit signed 
integers in one clock cycle).  So training the network will take about 10^8 
seconds, or 3 years on a PC.  (This would still be an order of magnitude faster 
than humans learn language).

I could speed this up by a factor of 100 by reducing the data set size to 10^8, 
but I don't think this could demonstrate learning abstract concepts such as 
algebra or logic.  Children are exposed to about 20,000 words per day, so 10^8 
would be equivalent to about age 3.  This is about the current state of the art 
in statistical language modeling.

I could also speed it up by running it in parallel on a network.  The neural 
network would have to be densely connected to allow learning aribtrarily 
complex patterns.  I estimate about 10^5 neurons with 10^4 connections each, 
allowing roughly 10^5 learned patterns or concepts in 10 layers, although it 
would not be a strictly layered feedforward architecture.  In a distributed 
computation, the 10^5 neuron states would have to be broadcast to the other 
processors each training/test cycle.  I think this could be done over an 
Ethernet with hundreds or thousands of PCs.  But I don't have access to that 
kind of hardware.

The other thing I can do is optimize the archtecture to reduce unneeded 
connections and unnecessary computation.  For instance, since I am simulating 
an asynchronous network, some neurons would have slow response times and not 
need to be updated every cycle.  Of course this requires lots of 
experimentation, and the experiments take a long time.

-- Matt Mahoney, [EMAIL PROTECTED]

----- Original Message ----

From: YKY (Yan King Yin) <[EMAIL PROTECTED]>

To: agi@v2.listbox.com

Sent: Thursday, November 16, 2006 11:09:36 PM

Subject: Re: [agi] One grammar parser URL

 On 11/17/06, Matt Mahoney <[EMAIL PROTECTED]> wrote: 

> Learning logic is similar to learning grammar.  A statistical model can 
> classify words into syntactic categories by context,  e.g. "the X is" tells 
> you that X is a noun, and that it can be used in novel contexts where other 
> nouns have been observed, like "a X was".  At a somewhat higher level, you 
> can teach logical inference by giving examples such as: 

> 

> All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.

> All birds have wings.  Tweety is a bird.  Therefore Tweety has wings.

> 

> which fit a pattern allowing you to complete the paragraph: 

> 

> All X are Y. Z is a X.  Therefore...

> 

> And likewise for other patterns that are taught in a logic class, e.g. "If X 
> then Y.  Y is false.  Therefore..."

> 

> Finally you give examples in formal notation and their English equivalents, 
> "(X => Y) ^  ~Y", and again use statistical modeling to learn the 
> substitution rules to do these conversions. 

 Statements like "all X are Y" involve variables, which are like placeholders 
for other words / phrases.  Are you sure your neurally-based model can handle 
variables?  I'm under the impression that conventional ANNs cannot handle 
variables. 

 > To get to this point I think you will first need to train the language model 
 > to detect higher level grammatical structures such as phrases and sentences, 
 > not just word categories.

>  

> I believe this can be done using a neural model.  This has been attempted 
> using connectionist models, where neurons represent features at different 
> levels of abstraction, such as letters, words, parts of speech, phrases, and 
> sentence structures, in addition to time delayed copies of these.  A problem 
> with connectionist models is that each word or concept is assigned to a 
> single neuron, so there is no biologically plausible mechanism for learning 
> new words.  A more accurate model is one in which each concept is correleated 
> with many neurons to varying degrees, and each neuron is correlated with many 
> concepts.  Then we have a mechanism, which is to shift a large number of 
> neurons slightly toward a new concept.  Except for this process, we can still 
> use the connectionist model as an approximation to help us understand the 
> true model, with the understanding that a single weight in the model actually 
> represents a large number of connections. 

> 

> I believe the language learning algorithm is essentially Hebb's model of 
> classical conditioning, plus some stability constraints in the form of 
> lateral inhibition and fatigue.  Right now this is still research.  I have no 
> experimental results to show that this model would work.  It is far from 
> developed.  I hope to test it eventually by putting it into a text 
> compressor.  If it does work, I don't know if it will train to a high enough 
> level to solve logical inference, at least not without some hand written 
> training data or a textbook on logic.  If it does reach this point, we would 
> show that examples of correct inference compress smaller than incorrect 
> examples.  To have it answer questions I would need to add a model of 
> discourse, but that is a long way off.  Most training text is not 
> interactive, and I would need about 1 GB. 

> 

> Maybe you have some ideas?

 I have thought of, and tried to implement, something like your idea; but I 
abandoned that approach in favor of a logic-based one because logic is more 
expressive.  I believe that encoding structural knowledge at the logical level 
is much more effective than starting from the bottom-up without any structural 
knowledge. 

 Learning 10^9 bits --- but all bits are not created equal.  In a 
hierarchically structured knowledgebase, the bits at the top of the hierarchy 
takes much more time to learn than the bits at the base levels.  Therefore, 
encoding some structural knowledge at the mid-levels helps a great deal to 
shorten the required learning time. 

 That's why I switched to the logic-based camp =)

 YY

  This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or 
change your options, please go to: http://v2.listbox.com/member/?list_id=303 

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] One grammar parser URL

Reply via email to