Ben and List,

I wanted to leave this. I'm glad I didn't.

I hadn't previously been paying attention and missed this thread. It's
actually very good. Thanks to Dorian Aur for dragging it back up.

I agree with almost all of it. As with all I've been commenting on here.
There is just one small gap I would like to close, one small loose thread I
would like to pull. But I think it is an important one.

Where I see the gap might be summarized with one statement by Linas:

LV, Feb 9: "If you are picky, you can have several different kinds of
common nouns, but however picky you are, you need fewer classes than there
are words."

"...however picky you are, you need fewer classes than there are words..."

"Fewer classes"?

How do we know that is true?

The number of ways a set of objects can be formed into classes is far
greater than the number of objects.

I think an analysis of why this is so -- why we can meaningfully create
more classes than we have examples, and why we need to -- will be, if not
AGI, then the basis of the leap from narrow AI to AGI.

On one level it may reduce to the assertion that gauge, as observed by
Linas in his symbolism and connectionism paper, is not trivial for language:

Note 10, p.g. 52, "Neural-Net vs. Symbolic Machine Learning":

"In physics, such an ambiguity of factorization is known as a global gauge
symmetry; fixing a gauge removes the ambiguity. In natural language, the
ambiguity is spontaneously broken: words have only a few senses, and are
often synonymous, making the left and right factors sparse. For this
reason, the analogy to physics is entertaining but mostly pointless."

It's not pointless Linas.

Experimentally, that language "gauge" is not pointless, was observed in the
results for the distributional learning of phonemes I cited, dating back 60
years, to the time Chomsky used it to destroy distributional learning as
the central paradigm for linguistics.

If Linas takes his formalism and adds just this one insight, I think we
have the basis to move beyond the wall which we been left at by symbolic
methods for 60 years.

In concrete terms it means we don't need to "learn" 10K to 1M rules. We
need to be able to "permute" the elements of (causally related) data many
more ways that that.

Sergio Pissanetzky comes close to this, with his analysis in terms of
permutations, also resulting in a network:

"Structural Emergence in Partially Ordered Sets is the Key to Intelligence"
http://sergio.pissanetzky.com/Publications/AGI2011.pdf

Though Pissanetzky may not see the significance of gauge either. We need a
mechanism to cluster these networks on the fly. Nothing exists until it is
observed.

Going on. Fascinating insights into the state of OpenCog:

LV: 'as of about 8-9 months ago, the language-learning project has been
handed over to a team. They are still getting up to speed; they don't have
any background in linguistics or machine learning, and so have been trying
to learn both at the same time, and getting tangled up and stalled, as a
result. Currently they are stumbling at the "clustering" step'

Sure they will!

Of course if you try and "learn" something which is subjective you'll run
your processor hot chasing your tail, but just end up breaking the wrong
symmetry for an appreciable fraction of your problems. That's been the
history of distributional language learning since the '90s. I interviewed
for a PhD on this with David Powers and Chris Manning in 1998.

The only solution is root and branch change away from "learning" and
towards resolving the "gauge" by finding contextually relevant low energy
permutations or factorizations of sets at run time.

Looking up what you are doing with your "MST" parser. You start by
clustering links, starting with mutual information.

One way to see the change which needs to take place might be to move away
from thinking about the problem as the character of links, and focus on the
problem as permutations of links. I think Pissanetzky's causal sets can be
thought of as links. The thing is he focuses on permuting them.

Think about it. Another way to find a "minimum spanning tree" is to set
your network oscillating. And that can be done in real time (for parallel
hardware).

Look at this paper:

A Network of Integrate and Fire Neurons for Community Detection in Complex
Networks
Marcos G. Quiles, Liang Zhao, Fabricio A. Breve, Roseli A. F. Romero
http://www.sbmac.org.br/dincon/trabalhos/PDF/invited/69194.pdf

You have all the pieces Linas. Put them together.

1) We don't know there are fewer meaningful classes of words than words, in
theory the number of combinations, sets, of words is far greater than the
number of words.
2) Even in theory, your own theory, by the properties of the formalism,
language networks also have gauge.
3) In practice, as insisted on by Chomsky 60 years ago, they are observed
to have gauge.
4) Only a context of use will break the symmetry and determine the correct
factorization. We need to factorize, permute, the links at run-time, in
context.
5) A good way to cluster links at run time, efficient on parallel hardware,
is to set a network oscillating and observe which groupings of links
synchronize.

Or if you have no power...

Ben, stop spending money trying to "learn" all the N!/(N-R)!R! possible
combinations of R member sets made from N words. The system has a gauge.
And many factorizations contradict, so can't be "learned" simultaneously
anyway. You won't be able to break the symmetry and know which of the many
contradictory sets you need until runtime.

OK, you may be right that BERT models may have an advantage that Linas
doesn't see, because deep nets do allow some recombination of parts, within
a network layer, at run time.

But the key to carry us past the wall, is forming sets at runtime.

It will be easy to do that just by setting the network oscillating about
the words in a given context.

-Rob

On Sun, Feb 10, 2019 at 8:58 AM Linas Vepstas <linasveps...@gmail.com>
wrote:

>
>
> On Sat, Feb 9, 2019 at 4:22 AM Ben Goertzel <b...@goertzel.org> wrote:
>
>>
>> We are now playing with hybridizing these symbolic-ish grammar
>> induction methods with neural net language models, basically using the
>> predictive models produced by models in the BERT lineage (but more
>> sophisticated than vanilla BERT) in place of simple mutual information
>> values to produce more broadly-context-sensitive parse choices in
>> Linas's MST parser...
>>
>
> This last sentence suggests that the near-total confusion about MST
> continues to persist in the team. I keep telling them to collect the
> statistics, and then discard the MST parse **immediately**. Trying to
> "improve" MST is a total waste of time.
>
> Seriously: Instead, try skipping the MST step entirely.  Just do not even
> do it, AT ALL. Rip it out. It is NOT a step that the algorithm even needs.
> I'll bet you that if you skip the MST step completely, the quality of your
> results will be more-or-less unchanged.  The results might even get
> better!
>
> If your results don't change, by skipping MST, or if your results get
> better, by skipping MST, then that should be a clear indicator that trying
> to "improve" MST is a waste of time!
>
> -- Linas
>
> --
> cassette tapes - analog TV - film cameras - you
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M843d12260f98baeb7e8413e0>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M98b30f438ce4fbf95f415ded
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to