Re: [agi] Language modeling

2006-10-25 Thread Matt Mahoney
- Original Message 
From: Richard Loosemore [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, October 24, 2006 12:37:16 PM
Subject: Re: [agi] Language modeling

Matt Mahoney wrote:
 Converting natural language to a formal representation requires language 
 modeling at the highest level.  The levels from lowest to highest are: 
 phonemes, word segmentation rules, semantics, simple sentences, compound 
 sentences.  Regardless of whether your child learned to read at age 3 or 
 not at all, children always learn language in this order.

And the evidence for this would be what?

Um, any textbook on psycholinguistics or developmental psychology, also the 
paper by Jusczyk I cited earlier.  Ben pointed me to a book by Tomasello which 
I haven't read, but here is a good summary of his work on language acquisition 
in children.
http://email.eva.mpg.de/~tomas/pdf/Mussen_chap_proofs.pdf

I realize that the stages of language learning overlap, but they do not all 
start at the same time.  It is a simple fact that children learn words with 
semantic content like ball or milk before they learn function words like 
the or of, in spite of the higher frequency of the latter.  Likewise, 
successful language models used for information retrieval ignore function words 
and word order. Furthermore, children learn word segmentation rules before they 
learn words, again consistent with statistical language models.  (The fact that 
children can learn sign language at 6 months is not inconsistent with these 
models.  Sign language does not have the word segmentation problem).

We can learn from these observations.  One conclusion that I draw is that you 
can't build an AGI and tack on language modeling later.  You have to integrate 
language modeling and train it in parallel with nonverbal skills such as vision 
and motor control, similar to training a child.  We don't know today whether 
this will turn out to be true.

Another important question is: how much will this cost?  How much CPU, memory, 
and training data do you need?  Again we can use cognitive models to help 
answer these questions.  According to Tomasello, children are exposed to about 
5000 to 7000 utterances per day, or about 20,000 words.  This is equivalent to 
about 100 MB of text in 3 years.  Children learn to use simple sentences of the 
form (subject-verb-object) and recognize word order in these sentences at about 
22-24 months.  For example, they respond correctly to make the bunny push the 
horse.  However, such models are word specific.  At about age 3 1/2, children 
are able to generalize novel words used in context as a verb to other syntactic 
constructs, e.g. to construct transitive sentences given examples where the 
verb is used only intransitively.  This is about the state of the art with 
statistical models trained on hundreds of megabytes of text.  Such experiments 
suggest that adult level modeling, which will be needed to interface with 
structured knowledge bases, will require about a gigabyte of training data.

-- Matt Mahoney, [EMAIL PROTECTED]




-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-24 Thread Lukasz Kaiser

Hi.


The state of the art in language modeling is at the level of simple sentences,
modeling syntax using n-grams (usually trigrams) or hidden Markov models ...


Just a remark: google recently made their up-to-5-grams available through LDC
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

- lk

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Pei Wang

On 10/22/06, Matt Mahoney [EMAIL PROTECTED] wrote:


Also to Novamente, if I understand correctly.  Terms are linked by a 
probability and confidence.  This seems to me to be an optimization of a neural 
network or connectionist model, which is restricted to one number per link, 
representing probability.


I'm afraid the difference of these two types of system is too large to
be compared in this way. In general, the weight of a link in NN is not
probability.


To model confidence you would have to make redundant copies of the input and 
output units and their connections.  This would be inefficient, of course.


I guess we use the word confidence differently. For what I mean, see
http://pages.google.com/edit/NARS.Wang/wang.confidence.pdf


One aspect of NARS and many other structured or semi-structured knowledge representations that concerns me is the direct representation of concepts such as 
is-a, equivalence, logic (if-then, and, or, not), quantifiers (all, some), time 
(before and after), etc.  These things seem fundamental to knowledge but are very hard to represent in a neural network, so it seems expedient to 
add them directly.  My concern is that the direct encoding of such knowledge greatly complicates attempts to use natural language, which is still an unsolved problem.  
Language is the only aspect of intelligence that separates humans from other animals.  Without language, you do not have AGI (IMHO).


I agree that the distinction between innate knowledge and acquired
knowledge is a major design decision. However, I believe it is
necessary to make the notions you mentioned innate, though in
different forms as how they are usually handled in symbolic AI.


My concern is that structured knowledge is inconsistent with the development of 
language in children.


First, I'm not so sure about the above conclusion. For example, to me,
is-a (which is called inheritance in NARS) is nothing but the
relation between special patterns and general patterns, which needs to
be there for many types of learning to happen.

Second, if it is indeed the case in children, it still doesn't mean
that AGI must be developed in the same way.

If these notions can be easily developed from more basic ones, we can
make them learned. However, it is not the case so far.


As I mentioned earlier, natural language has a structure that allows direct 
training in neural networks using fast, online algorithms such as perceptron 
learning, rather than slow algorithms with hidden units such as back 
propagation.  Each feature is a linear combination of previously learned 
features followed by a nonlinear clamping or threshold operation.  Working in 
this fashion, we can represent arbitrarily complex concepts.


It depends on your model of concept. For mine, the NN mechanism is not
enough to learn a concept. See
http://nars.wang.googlepages.com/wang.categorization.pdf


Children also learn language as a progression toward increasingly complex 
patterns.


Sure, I have no problem about that.


- phonemes beginning at 2-4 weeks

- phonological rules for segmenting continuous speech at 7-10 months [1]

- words (semantics) beginning at 12 months

- simple sentences (syntax) at 2-3 years

- compound sentences around 5-6 years


Since I don't think AGI should accurately duplicate human
intelligence, I make no attempt to follow the same process.


Attempts to change the modeling order are generally unsuccessful.


It depends. For example, of course an AGI also needs to learn simple
sentences before compound sentences, but I don't think it is necessary
for it to start at phonemes.


For example, attempting to parse a sentence first and then extract its meaning does not work.  You cannot parse a 
sentence without semantics.  For example, the correct parse of I ate pizza with NP depends on whether NP is 
pepperoni, a fork, or Sam.


Fully agree. See http://nars.wang.googlepages.com/wang.roadmap.pdf ,
Section 3(2).


Now when we hard code knowledge about logic, quantifiers, time, and other 
concepts and then try to retrofit NLP to it, we are modeling language in the 
worst possible order.  Such concepts, needed to form compound sentences, are 
learned at the last stage of language deveopment.  In fact, some tribal 
languages such as Piraha [2] do not ever reach this stage, even for adults.


It depends on what you mean by logic and so on. Of course things
like propositional logic and predicate logic are not innate, but
learned at a very late age. However, I believe there is an innate
logic, a general-purpose reasoning-learning mechanism, which must be
coded in the initial structure of the system. See
http://nars.wang.googlepages.com/wang.roadmap.pdf , Section 4(3).

I don't think anyone is arguing that learning can come from nowhere.
The difference is in what should be included in this innate logic. For
example, I argued that inheritance should be included in it in my
book, Section 10.2 (sorry, no on-line material).


My caution is that any 

Re: [agi] Language modeling

2006-10-23 Thread Ben Goertzel
Hi Matt,Regarding logic-based knowledge representation and language/perceptual/action learning -- I understand the nature of your confusion, because the point you are confused on is exactly the biggest point of confusion for new members of the Novamente AI team.
A very careful distinction needs to be drawn between:1) the distinction between1a) using probabilistic and formal-logical operators for representing knowledge1b) using neural-net type operators (or other purely quantitative, non-logic-related operators) for representing knowledge
2) the distinction between2a) using ungrounded formal symbols to pretend to represent knowledge, e.g. an explicit labeled internal symbol for cat, one for give, etc.2b) having an AI system recognize patterns in its perception and action experience, and build up its own concepts (including symbolic ones) via learning; which means that concepts like cat and give will generally be represented as complex, distributed structures in the knowledge base, not as individual tokens
From the history of mainstream AI, one might conclude that 1a and 2a inevitably cluster together, so that the only hope for 2b lies in 1b. However, this is not the case. Novamente combines 1a and 2b, and I believe NARS is intended to also
My contention is that probabilistic logic can be a suitable knowledge representation for raw perceptions and actions, and that logical inference (combined with pattern mining, evolutionary learning and other cognitive operations) can be used to build up abstract concepts grounded in perception and actions, where these concepts are experientially-learned and richly complex and yet expressed internally in probabilistic logic formalism. For instance, this means that the cat concept may well not be expressed by a single cat term, but perhaps by a complex learned (probabilistic) logical predicate.
How this relates to what the human mind does is a whole other question. I have my hypotheses about the mapping between the human brain's KR and probabilistic logic. But my point for now is simply that all logic-based systems should not be damned based on the fact that historically a bunch of famous AI researchers have used logic-based KR in a cognitively unworkable way.
Probabilistic logic is a general formalism that can express anything, and furthermore it can express any thing in a whole lot of different ways. The trick in using it properly for AGI is to integrate it fully with an experiential learning system. Pei's distinction between experiential semantics and model-theoretic semantics is also of interest here.
-- Ben GOn 10/22/06, Matt Mahoney [EMAIL PROTECTED] wrote:
- Original Message From: Pei Wang [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, October 21, 2006 7:03:39 PM
Subject: Re: [agi] SOTAWell, in that sense NARS also has some resemblance to a neuralnetwork, as well as many other AI systems.Also to Novamente, if I understand correctly.Terms are linked by a probability and confidence.This seems to me to be an optimization of a neural network or connectionist model, which is restricted to one number per link, representing probability.To model confidence you would have to make redundant copies of the input and output units and their connections.This would be inefficient, of course.
One aspect of NARS and many other structured or semi-structured knowledge representations that concerns me is the direct representation of concepts such as is-a, equivalence, logic (if-then, and, or, not), quantifiers (all, some), time (before and after), etc.These things seem fundamental to knowledge but are very hard to represent in a neural network, so it seems expedient to add them directly.My concern is that the direct encoding of such knowledge greatly complicates attempts to use natural language, which is still an unsolved problem.Language is the only aspect of intelligence that separates humans from other animals.Without language, you do not have AGI (IMHO).
My concern is that structured knowledge is inconsistent with the development of language in children.As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation.Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation.Working in this fashion, we can represent arbitrarily complex concepts.In a connectionist model, we have, for example:
- pixels- line segments- letters- words- phrases, parts of speech- sentencesetc.Children also learn language as a progression toward increasingly complex patterns.
- phonemes beginning at 2-4 weeks- phonological rules for segmenting continuous speech at 7-10 months [1]- words (semantics) beginning at 12 months- simple sentences (syntax) at 2-3 years
- compound sentences around 5-6 yearsAttempts to change the modeling order are generally 

Re: [agi] Language modeling

2006-10-23 Thread Richard Loosemore

Matt Mahoney wrote:

My concern is that structured knowledge is inconsistent with the development of language in children.  As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation.  Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation.  Working in this fashion, we can represent arbitrarily complex concepts.  In a connectionist model, we have, for example: 



Pei has already addressed some of the other problems with what you have 
said, so I will confine my comments to this part.  Perceptron learning 
is known (since four decades ago) to have limitations that make it a 
ludicrous choice for learning.  See Minsky and Papert (the book 
Perceptrons).


And in the sequence of items that follow, some things can be done with 
conventional NNs like backprop, but others like phrases and sentences 
are completely impossible unless you add something to them.


Your comments that attempting to parse a sentence first and then 
extract its meaning does not work is naive.  Humans clearly do partial 
parsing simultaneously with semantic decoding (there is a *huge* 
literature on this, but for one choice example, see Frazier, L., 
Clifton, C.,  Randall, J. (1983).  Filling Gaps:  Decision principles 
and structure in sentence comprehension.  Cognition, 13,  187-222.).


[snip]

Children also learn language as a progression toward increasingly complex patterns. 

- phonemes beginning at 2-4 weeks 

- phonological rules for segmenting continuous speech at 7-10 months [1] 

- words (semantics) beginning at 12 months 

- simple sentences (syntax) at 2-3 years 

- compound sentences around 5-6 years 


ARR!

Please don't do this.  My son (like many other kids) had finished about 
fifty small books by the time he was 5, and at least one of the Harry 
Potter books when he was 6.


You are talking about these issues at a pre-undergraduate level of 
comprehension.




Richard Loosemore.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Starglider
On 23 Oct 2006 at 10:06, Ben Goertzel wrote:
 A very careful distinction needs to be drawn between:
 
 1) the distinction between
 1a) using probabilistic and formal-logical operators for representing 
 knowledge
 1b) using neural-net type operators (or other purely quantitative, non-
 logic-related operators) for representing knowledge 
 
 2) the distinction between
 2a) using ungrounded formal symbols to pretend to represent knowledge, 
 e.g. an explicit labeled internal symbol for cat, one for give, etc.
 2b) having an AI system recognize patterns in its perception and action 
 experience, and build up its own concepts (including symbolic ones) via 
 learning; which means that concepts like cat and give will generally be 
 represented as complex, distributed structures in the knowledge base, not 
 as individual tokens 
 
 From the history of mainstream AI, one might conclude that 1a and 2a 
 inevitably cluster together, so that the only hope for 2b lies in 1b.  
 However, this is not the case. Novamente combines 1a and 2b, and I believe 
 NARS is intended to also 

I agree that combining probabilistic logic (with a reasonable amount of
consistency enforcement) with 'bottom-up' learning is crucial. However
I would suggest that '2a' is often worthwhile as a soft, context-dependent
index into '2b', particularly as a inference tool when you can do a lossy
simplification to symbolic logic, do some fast inference on that, then pop
back into managed-consistency-scope probabilistic logic with some
conclusions that are conditional on the estimated probability that the
assumptions behind the simplification hold. Most human usage of scientific
theories and engineering rules looks like this. Some applications (e.g.
FAI self-modification) demand complete rigour and more complicated
techniques to get it, but those are relatively rare.

Similarly it's ok to use embedded chunks of 1b when their inputs and
outputs are tightly scoped and you know what they're doing. Though the
kind of connectionist/informal learning algorithms I'd advocate using
(fully-custom, non-general and tightly-integrated algorithms generated by
the AI using an optimisation pressure model) don't look much like (in my
experience to date) the currently popular plausible-seeming-to-humans
algorithms.

 My contention is that probabilistic logic can be a suitable knowledge 
 representation for raw perceptions and actions, and that logical inference 
 (combined with pattern mining, evolutionary learning and other cognitive 
 operations) can be used to build up abstract concepts grounded in 
 perception and actions,

Agree, with the proviso that my idea of 'adequate grounding' is different
from yours (I'd characterise mine as 'explicit grounding' and yours as
'implicit grounding').

 For instance, this means that the cat concept may well not be 
 expressed by a single cat term, but perhaps by a complex learned 
 (probabilistic) logical predicate. 

I don't think it's really useful to discuss representing word meanings
without a sufficiently powerful notion of context (which is really hard).
 
 But my point for now is simply that all logic-based systems should not be
 damned based on the fact that historically a bunch of famous AI
 researchers have used logic-based KR in a cognitively unworkable way. 

I certainly agree with that, as long as 'logic-based' means 'probabilistic
logic with bottom-up modelling and no unitary concepts or simple
word-symbol mappings'. Unfortunately many people would read 'logic
based' as 'looks like Cyc'.

 Probabilistic logic is a general formalism that can express anything, and 
 furthermore it can express any thing in a whole lot of different ways.

That isn't a point in its favour. Expressive scope allows people to say
'oh, our system could do that, it just needs the right
rules/network/whatever' whenever you ask them 'so how would your system
implement cognitive ability X?'. The limited expressive scope of classic
ANNs was actually essential for getting relatively naïve and simplistic 
learning algorithms (e.g. backprop, Hebbian learning) to produce useful
solutions to an interesting (if still fairly narrow) class of problems.
OTOH, if you disallow 'please wait for me to program that into the KR' and
'we just need a bigger computer!' excuses, using a very expressive
substrate (at the limit, TM-equivalent code) actually forces people to
design powerful earning algorithms, so in that sense maybe it is a good
thing.

Michael Wilson
Director of Research and Development
Bitphase AI Ltd - http://www.bitphase.com


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Ben Goertzel
Hi,  For instance, this means that the cat concept may well not be
 expressed by a single cat term, but perhaps by a complex learned (probabilistic) logical predicate.I don't think it's really useful to discuss representing word meaningswithout a sufficiently powerful notion of context (which is really hard).
Agreed. Most meanings in Novamente are context-relative, in fact... 
 But my point for now is simply that all logic-based systems should not be damned based on the fact that historically a bunch of famous AI researchers have used logic-based KR in a cognitively unworkable way.
I certainly agree with that, as long as 'logic-based' means 'probabilisticlogic with bottom-up modelling and no unitary concepts or simpleword-symbol mappings'. Unfortunately many people would read 'logic
based' as 'looks like Cyc'.Thanks -- you summarized one of my main points very effectively.
 Probabilistic logic is a general formalism that can express anything, and furthermore it can express any thing in a whole lot of different ways.That isn't a point in its favour. Expressive scope allows people to say
'oh, our system could do that, it just needs the rightrules/network/whatever' whenever you ask them 'so how would your systemimplement cognitive ability X?'. The limited expressive scope of classicANNs was actually essential for getting relatively naïve and simplistic
learning algorithms (e.g. backprop, Hebbian learning) to produce usefulsolutions to an interesting (if still fairly narrow) class of problems.Well, recurrent NN's also have universal applicability, just like probabilistic logic systems. And, this means that any general endorsement or condemnation of logic-based OR NN-based methods is pretty silly These are just very general tools, which may be used in many different ways.
Ben

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread YKY (Yan King Yin)

On 10/23/06, Matt Mahoney [EMAIL PROTECTED] wrote:  [...]
 One aspect of NARS and many other structured or semi-structured knowledge representations that concerns me is the direct representation of concepts such as is-a, equivalence, logic (if-then, and, or, not), quantifiers (all, some), time (before and after), etc.These things seem fundamental to knowledge but are very hard to represent in a neural network, so it seems expedient to add them directly.My concern is that the direct encoding of such knowledge greatly complicates attempts to use natural language, which is still an unsolved problem.Language is the only aspect of intelligence that separates humans from other animals.Without language, you do not have AGI (IMHO).
  My concern is that structured knowledge is inconsistent with the development of language in children.As I mentioned earlier, natural language has a structure that allows direct training in neural networks using fast, online algorithms such as perceptron learning, rather than slow algorithms with hidden units such as back propagation.Each feature is a linear combination of previously learned features followed by a nonlinear clamping or threshold operation.Working in this fashion, we can represent arbitrarily complex concepts.In a connectionist model, we have, for example:


It is not obligatory that AGI designers should make their AGIs as ignorant as babies at the beginning. Why many people have this predilection is because they think that the AGI should be able to learn *anything*.


Therefore: the point is whether our AGI can learn/express/reason withanything and everything; but it's not whether we should equip the AI with an advanced KR structure initially. Ichoose the latter as a short cut. I think it is a good thing if the KR has a good design.


A belatedpoint is that classical logic usually cannot talk about syncategorimatic constructs. For example in predicate logic you cannot say AND is a very useful thingy.I think myterm-logic enhanced version can do this. =)


YKY

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Starglider
Ben Goertzel wrote:
 The limited expressive scope of classic ANNs was actually essential
 for getting relatively naïve and simplistic learning algorithms (e.g.
 backprop, Hebbian learning) to produce useful solutions to an
 interesting (if still fairly narrow) class of problems.
 
 Well, recurrent NN's also have universal applicability, just like 
 probabilistic logic systems.

And not coincidentally designing learning algorithms that work well
on recurrent networks is much harder than for non-recurrent ones.
Though many of the more extreme ANN fans seem to be in denial
of this (or that fine-grained recurrency is actually important).

In general I am more in favour of designing powerful learning algorithms
that work on rough fitness landscapes than I am of designing a
substrate that flattens the apparent fitness landscape for relevant
classes of problem. The former approach scales better, forces you to
understand what you're doing better and is usually more compatible
with reflection and a causally clean goal system. The latter approach
is more compatible with the zero-foresight and incremental-dev-path 
restrictions of evolution, but humans shouldn't be hobbled by those.

Michael Wilson
Director of Research and Development
Bitphase AI Ltd - http://www.bitphase.com


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Ben Goertzel
YKY,Of course there is no a priori difference betw a set of nodes and links and a set of logical relationships...The question with your DB of facts about love and so forth is whether it captures the subtler uncertain patterns regarding love that we learn via experience My strong suspicion is that the patterns (uncertain logical relationships) that are easily articulable in compact form by the conscious human mind (when building a DB), are only a small subset of the critical ones The subtler patterns that we acquire via experience and that exist in our unconscious are probably the vast majority of critical patterns for really understanding something like love...
-- BenOn 10/23/06, YKY (Yan King Yin) [EMAIL PROTECTED] wrote:

On 10/23/06, Ben Goertzel [EMAIL PROTECTED] wrote:   2) the distinction between
 2a) using ungrounded formal symbols to pretend to represent knowledge, 
e.g. an explicit labeled internal symbol for cat, one for give, etc. 2b) having an AI system recognize patterns in its perception and action experience, and build up its own concepts (including symbolic ones) via learning; which means that concepts like cat and give will generally be represented as complex, distributed structures in the knowledge base, not as individual tokens 


I think in G0, symbols are grounded and they exist in complex relations with other symbols. What may be misleading is that you see I talk about a symbol like love or 3 in isolation and you think that is very not-AGI to do so. But I have a KB of facts about love, 3, etc, even augmented with probabilities. There is no real difference between this and your graphical representation. Any graph can be completely described by listing all its nodes and vertices.


YKY

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]



This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread justin corwin

I don't exactly have the same reaction, but I have some things to add
to the following exchange.

On 10/23/06, Richard Loosemore [EMAIL PROTECTED] wrote:

Matt Mahoney wrote:
 Children also learn language as a progression toward increasingly complex 
patterns.
 - phonemes beginning at 2-4 weeks
 - phonological rules for segmenting continuous speech at 7-10 months [1]
 - words (semantics) beginning at 12 months
 - simple sentences (syntax) at 2-3 years
 - compound sentences around 5-6 years

ARR!

Please don't do this.  My son (like many other kids) had finished about
fifty small books by the time he was 5, and at least one of the Harry
Potter books when he was 6.

You are talking about these issues at a pre-undergraduate level of
comprehension.


Anecdotal evidence is always bad, but I will note that I myself was
reading Tolkein(badly) by 1st grade, and when I was five was scared
badly by a cold war children's book Nobody wants a Nuclear War.

There are also other problems with neat progressions like this. One
glaring one is that much younger children can learn sign
language(which is physically much easier) and communicate fairly
complicated concepts far in advance of speech, so much so that many
parent courses now suggest and support learning and teaching baby sign
language so as to be able to communicate desires, needs, and
explanations with the child much earlier.


--
Justin Corwin
[EMAIL PROTECTED]
http://outlawpoet.blogspot.com
http://www.adaptiveai.com

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Bob Mottram
In child development understanding seems to considerably precede the ability to articulate that understanding. Also development seems to generally move from highly abstract representations (stick men, smily suns) to more concrete adult-like ones.
On 23/10/06, justin corwin [EMAIL PROTECTED] wrote:
Anecdotal evidence is always bad, but I will note that I myself wasreading Tolkein(badly) by 1st grade, and when I was five was scaredbadly by a cold war children's book Nobody wants a Nuclear War.


This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Matt Mahoney
I am interested in identifying barriers to language modeling and how to 
overcome them.

I have no doubt that probabilistic models such as NARS and Novamente can 
adequately represent human knowledge.  Also, I have no doubt they can learn 
e.g. relations such as all frogs are green from examples of green frogs.  My 
question relates to solving the language problem: how to convert natural 
language statements like frogs are green and equivalent variants into the 
formal internal representation without the need for humans to encode stuff like 
(for all X, frog(X) = green(X)).  This problem is hard because there might not 
be terms that exactly correspond to frog or green, and also because 
interpreting natural language statements is not always straightforward, e.g. I 
know it was either a frog or a leaf because it was green.

Converting natural language to a formal representation requires language 
modeling at the highest level.  The levels from lowest to highest are: 
phonemes, word segmentation rules, semantics, simple sentences, compound 
sentences.  Regardless of whether your child learned to read at age 3 or not at 
all, children always learn language in this order.

The state of the art in language modeling is at the level of simple sentences, 
modeling syntax using n-grams (usually trigrams) or hidden Markov models 
generally without recursion (flat), and modeling semantics as word 
associations, possibly generalizing via LSA or clustering to exploit the 
transitive property (if A means B and B means C, then A means C).  This is the 
level of modeling of the top text compressors on the large text benchmark and 
the lowest perplexity models used in speech recognition.  I gave an example of 
a Google translation of English to Arabic and back.  You may have noticed that 
strings of up to about 6 words looked grammatically correct, but that longer 
sequences contained errors.  This is a characteristic of trigram models.  
Shannon noted in 1949 that random sequences that fit the n-gram (letter or 
word) statistics of English appear correct up to about 2n.

All of these models have the property that they are trained in the same order 
that children learn language.  For example, parsing sentences without semantics 
is difficult, but extracting semantics without parsing (text search) is easy.  
As a second example, it is possible to build a lexicon from text only if you 
know the rules for word segmentation.  However, the reverse is not true.  It is 
not necessary to have a lexicon to segment continuous text (spaces removed).  
The segmentation rules can be derived from n-gram statistics, analogous to 
learning the phonological rules for segmenting continuous speech.  This was 
first demonstrated in text by Hutchens and Alder, which I improved on in 1999.  
http://cs.fit.edu/~mmahoney/dissertation/lex1.html

With this observation, it seems that hard coding rules for inheritance, 
equivalence, logical, temporal etc. relations, into a knowledge representation 
will not help in learning these relations from text.  The language model still 
has to learn these relations from previously learned, simpler concepts.  In 
other words, the model has to learn the meanings of is, and, not, 
if-then, all, before, etc. without any help from the structure of the 
knowledge represenation or explicit encoding.  The model has to first learn how 
to convert compound sentences into a formal representation and back, and only 
then can it start using or adding to the knowledge base.

So my question is: what is needed to extend language models to the level of 
compound sentences?  More training data?  Different training data?  A new 
theory of language acquisition?  More hardware?  How much?

-- Matt Mahoney, [EMAIL PROTECTED]


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Language modeling

2006-10-23 Thread Ben Goertzel

So my question is: what is needed to extend language models to the level of 
compound sentences?  More training data?  Different training data?  A new 
theory of language acquisition?  More hardware?  How much?


What is needed is:

A better training approach, involving presentation of compound
sentences in conjunction with real-world (or sim-world) situations ...

A better theory of language acquisition, more fully explaining the
impact of semantics and pragmatics on syntax learning.

I like Tomassello's language acquisition theory BTW (see his book
Constructing a Language), but connecting his ideas with pragmatic AI
algorithms and structures is a lot of work (as I know for I have done
it in the context of Novamente).

Also Calvin and Bickerton, in Lingua ex Machina, have some interesting
things to say, though they don't dig as deep as Tomassello

-- Ben G

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]