Re: Re: [agi] Understanding Natural Language

Andrii (lOkadin) Zvorygin Sun, 26 Nov 2006 13:37:28 -0800

On 11/25/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:

Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote:
>> Even if we were able to constrain the grammar, you still have the
problem that people will still make ungrammatical statements, misspell
words, omit words, and so on.
>Amazing you should mention such valid points against natural languages.


This misses the point.  Where are you going to get 1 GB of Lojban text to train 
your language model?

Well A:  I could just get IRC logs and mailing lists of the current
Lojban community.
B: point is to translate English into Lojban
C: I'm not training a language model. I'm creating a parser, then a
translator, then other things. The translator will have some elements
of an AI probably Bayesian probability will be involved, it's too
early to say however. I may be on the wrong list discussing this.

If you require that all text pass through a syntax checker for

errors, you will greatly increase the cost of generating your training
data.
Well A: There are rarely any errors -- unlike in a natural language
like say English.
B: Addressed above.

This is not a trivial problem.

Which one? Maybe as a whole it's not trivial, but when you break it
down the little pieces are all individually trivial.

It is a big part of why programmers can only write 10 lines of code

per day on projects 1/1000 the size of a language model.
Monolithic programming is the paradigm of the past, is one of the
reasons I'm creating this new development model.

Then when you have built the model, you will still have a system that

is intolerant of errors and hard to use.
Because of the nature of the development model -- designed after
functional programming languages, going to be able to add functions
anywhere in the process without interupting the rest of the functions,
as it wont be changing the input other functions recieve(unless that
is the intent).
Hard to use? Well we'll see when I have a basic implementation, the
whole point is so that it will be easy to use, maybe it wont work out
though --  can't see how. .iacu'i(skepticism)

Your language model needs to have a better way to deal with

inconsistency than to report errors and make more work for the user.
It can easily just check what the previous response of this user, or
someone else that has made a similar error was when correcting.
Trivial once we get the implementation going.


>Lojban already exceeds many natural languages in it's ability to express.

How so?  In English I can use mathematical notation understood by others to 
express complex ideas.  I could even invent a new branch of mathematics, 
introduce appropriate notation, and express ideas in it.


Stops being English when it starts being "mathematical notation". In
English, mathematical notation usually has either non-standard or
ungrammatical formulation. .i.e. "right bracket one plus two left
bracket times four equals x?"
as opposed to the same sentance in lojban "li vei pa su'i re ve'o pi'i
vo du da". Where each word corresponds to a math character(li declares
it a number).

There is no such distinction in Lojban,  cmavo and gismu for math can
be used anywhere in the language. I can/do add any mathematical
feature and use it while speaking. .i.e. "xu do du mi" meaning "are
you the identity of me?" the du is the same as the english "=". "Are
you equal to me" leaves it a little ambiguous and vague.

Please note that I am forced to use the limitations of the English
language while conveying all this, so the translations into English do
tend to be vague/ambiguous in comparison to the Lojban.

Oh and English doesn't have Attitudinals. They are two to three letter
emotional indicators, extremly valuable once you know how to use
them.io.o'a.u'i(respect for previous statement in a pride amusement
kind of way).


There are cognitive limits to what natural language can express, such as the 
inability to describe a person's face (as well as a picture would), or to 
describe a novel odor, or to convey learned physical skills such as swimming or 
riding a bicycle.

I agree.

One could conceivably introduce notation to describe such things in

any natural or artificial language, but that does not solve the
problem.  Your neural circuitry has limits; it allows you to connect a
face to a name but not to a description.  Any such notation might be
usable by machines but not by humans.
What about adding notations for URI's?
la'o.ubu. http://community.livejournal.com/lojban/17618.html .ubu.PIXrami
the foreign named start quote URI end quote is a picture of me.

Say recreating a scenario however may be more difficult. We'll figure
that out in the MMORPG that will be created to further the advancment
of the AI.

I'm thinking of all of this as not some little project, or training a
bunch of algorithms. I am creating a new development framework, though
that's too specific a term to describe it.

My initial reasoning was that right now many programs don't use AI,
because programmers don't know, and the ones that do can't easily add
code. I was thinking of making an AI library, but realized you'd have
to make one for every language. Then I though about how we could unify
every language. Then I realized that there was non-standard convention
everywhere. It was mentioned to me that Lojban existed I found it
learned it,  am now making parser. Allow for standard compliance to be
easier as it is intuitive and "natural" to comply to standards when
you are simply extending the language that you speak by making new
sentances from words you already know. Sentances which are no
different than programs.

A general AI would emerge out of many hundreds of people using the
development framework and extending it to understand them. Eventually
it will understand it's users and be able to do what they ask of it.
As what has been explained once doesn't have to be again -- this is of
course assuming the small functions connected to distributed network
to redistribute functions to other computers that will look for them
without any need of user interferance (mmorpg will probably run on
same network).


-- Matt Mahoney, [EMAIL PROTECTED]

----- Original Message ----
From: Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Saturday, November 25, 2006 5:01:04 AM
Subject: Re: Re: [agi] Understanding Natural Language

On 11/24/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
> Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote:
> >I  personally don't understand why everyone seems to insist on using
> >ambiguous illogical languages to express things when there are viable
> >alternative available.
>
> I think because an AGI needs to communicate in languages that people already 
know.
I don't understand how artificial languages like Lojban contribute to
this goal.
We should focus our efforts instead on learning and modeling existing languages.
>
> I understand that artificial languages like Lojban and Esperanto and Attempto 
have simple grammars.
>I don't believe they would stay that way if they were widely used for
person to person communication (as opposed to machine interfaces).
Lojban grammar is easily extensible and forwards compatible.
You can add features to the language through CMAvo and GISmu.
Lojban already exceeds many natural languages in it's ability to
express.  There are very crucial parts of communication that English
lacks such as logical connectives and attitudinals.

>Languages evolve over time, both in individuals, and more slowly in
social groups.
Are you implying languages evolve faster in individuals?
>A language model is not a simple set of rules.
A natural language model is not.
An artificial language is constructed with rules that were also
created by individual -- as opposed to groups of -- humans. Lojban was
especially designed to be logical, unlike Esperanto.
Therefore making them recreatable by individual humans, and depending
on your definition: "simple".
>It is a probability distribution described by a large set of patterns
such as words, word associations, grammatical structures and
sentences.
The approach of a world of blind to seeing is to feel at things.
Sometimes they wonder if there is not another way.
>Each time you read or hear a message, the probabilities for the
observed patterns are increased a little and new patterns are added.
>In a social setting, these probabilities tend to converge by
consensus as this knowledge is shared.
I agree this is a wonderful solution to predicting what the vocabulary
of a language group is.
>Formal definitions of artificial languages do not capture this type
of knowledge, the thousands or millions of new words, idioms, shared
knowledge and habits of usage.
sa'u(simply speaking) Artificial languages lack a historic/cultural user base.
Do I even need to reply to that?zo'o.ui.u'i(last statement humourously
while happy in an amused kind of way)
>
> Even if we were able to constrain the grammar, you still have the problem 
that people will still make ungrammatical statements, misspell words, omit words, 
and so on.
Amazing you should mention such valid points against natural languages.

* ungrammatical statements:
If they were ungrammatical they wouldn't parse in the universal Lojban
parser(All Lojban parsers can be Universal Lojban parsers as long as
they follow the few simple grammar rules).
* misspell words:
     In Lojban words have a very strict formation,
     mu'a(for example): GISmu are either in (ccvcv or cvccv formation)
all others are also syntactically unambiguous.
     Additionally words in Lojban are specifically designed not to
sound similar to each other, so chances are it  still looks/sounds
just like the original word even when misspelled.
     If a parse error occurs(rare for Lojban users, usually typos) the
user can always be notified.
*omit words:
(I gave an example of some GISmu before, basically they have
predefined places, so you can always ask a specific question about
ommitted information by simply putting a "ma" for the SUMti(argument)
which you wish to know, or "mo" for the SELbri(function).
*and so on.
>A language model must be equipped to deal with this.
go'i.ui(repetition of your statement as confirmation and happiness)
>It means evaluating lots of soft constraints from a huge database for
error correction, just like we do to resolve ambiguity in natural
language.
If "It" can be substituted as "Resolving ambiguity in natural
languages" OR(logical connective) "Resolve ambiguity in ambiguous
languages", I agree.
> -- Matt Mahoney, [EMAIL PROTECTED]
mu'omi'eLOKadin(Over to you, my name in Lokadin.)

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



--
ta'o(by the way)  We With You Network at: http://lokiworld.org .i(and)
more on Lojban: http://lojban.org
mu'oimi'e lOkadin (Over, my name is lOkadin)

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Re: [agi] Understanding Natural Language

Reply via email to