Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-24 Thread Rob Freeman
Yes, I seem to have missed where you showed how principal component
analysis would apply to my simple example of ordering sets.

Pity. I would have liked to have seen PCA applied to the simple task
of ordering a set of people alternately by height or age.

Anyway, the good thing is that you can present no coherent objection
to what I suggest.

Your initial objection that my "contradiction" was just your
"variance", having fallen to the idea that what I mean by
"contradiction" is "not what people usually mean." And your second
objection that it is just PCA being forced to lapse into the obscurity
of a missed explanation.

On Mon, Jun 24, 2024 at 5:18 PM Boris Kazachenko  wrote:
>
> Rob, I already explained how it applies to your example, your just "unable" 
> to comprehend it. Because your talk / think ratio is way too high.
> Artificial General Intelligence List / AGI / see discussions + participants + 
> delivery options Permalink

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M7762c7f027470fbb36b7dea0
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-24 Thread Rob Freeman
If you mean my focus is on encoding or representation as the key
unsolved problem holding us back from AGI, then yes, you're probably
right.

On Mon, Jun 24, 2024 at 4:27 PM Quan Tesla  wrote:
>
> Rob
>
> I applied my SSM method to your output here, like a broad AI might've done. 
> The resultant context diagram was enlightening.  You talk many things, but 
> seemingly only evidence two, primary things. One of those has to do with 
> constructors and embedding (the text-to-number transformations). I read up on 
> it. It reminded me of numbering schemas for telecom systems.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M00535cc92708f61aa419b4ae
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-23 Thread Rob Freeman
Quan,

Lots of words. None of which mean anything to me...

OK "soft-systems ontology" turns up something:

https://en.wikipedia.org/wiki/Soft_systems_methodology

This British guy Checkland wrote some books on management techniques.
Some kind of "seven step" process:

1) Enter situation in which a problem situation(s) have been identified
2) Address the issue at hand
3) Formulate root definitions of relevant systems of purposeful activity
4) Build conceptual models of the systems named in the root
definitions : This methodology comes into place from raising concerns/
capturing problems within an organisation and looking into ways how it
can be solved. Defining the root definition also describes the root
purpose of a system.
5) The comparison stage: The systems thinker is to compare the
perceived conceptual models against an intuitive perception of a
real-world situation or scenario. Checkland defines this stage as the
comparison of Stage 4 with Stage 2, formally, "Comparison of 4 with
2". Parts of the problem situation analysed in Stage 2 are to be
examined alongside the conceptual model(s) created in Stage 4, this
helps to achieve a "complete" comparison.
6) Problems identified should be accompanied now by feasible and
desirable changes that will distinctly help the problem situation
based in the system given. Human activity systems and other aspects of
the system should be considered so that soft systems thinking, and
Mumford's needs can be achieved with the potential changes. These
potential changes should not be acted on until step but they should be
feasible enough to act upon to improve the problem situation.
7) Take action to improve the problem situation

CATWOE: Customers, Actors, Transformation process, Weltanshauung,
Owner, Environmental constraints.

I'm reminded of Edward de Bono. Trying to break pre-conceptions and
being open to seeing a problem from different perspectives.

Look, Quan, in the most general way these kinds of ideas might be
relevant. But only in an incredibly general sense.

Would we all benefit from taking a moment to reflect on "how to place
LLMs in context of such developments." Maybe.

At this point I'm on board with Boris. You need to try and write some
code. Simply talking about how management theory has some recent
threads encouraging people to brainstorm together and be open to
different conceptions of problems, is not a "ready to ship"
implementation of AGI.

On Mon, Jun 24, 2024 at 5:32 AM Quan Tesla  wrote:
>
> Rob. I'm referring to contextualization as general context management within 
> complex systems management. ...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M3d4918db953d60852bc7ccd0
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-23 Thread Rob Freeman
No, I don't believe I am talking about PCA. But anyway, you are unable
to demonstrate how you implement PCA or anything else, because your
algorithm is "far from complete".

You are unable to apply your conception of the problem to my simple
example of re-ordering a set.

How about PCA itself? If that's what you think I am suggesting, can
you show how PCA would apply to my simple example?

On Mon, Jun 24, 2024 at 9:14 AM Boris Kazachenko  wrote:
>
> What I mean by contradiction is different orderings of an entire set
> of data, not points of contrast within a set of data
>
> That's not what people usually mean by contradiction, definitely not in a 
> general sense.
>
> You are talking about reframing dataset (subset) of multivariate items along 
> the spectrum of one or several most predictive variable in items. This is 
> basically a PCA, closely related to Spectral Clustering I mentioned in the 
> first section of my readme:   "Initial frame of reference here is space-time, 
> but higher levels will reorder the input along all sufficiently predictive 
> derived dimensions, similar to spectral clustering."
>
...
> I can't give you any demonstration because my algo is far from complete. It 
> would be like like demostrating how ANN works before you figure out how 
> single-node perceptron works. Except that my scheme is hundreds of times more 
> complex than perceptron. You just have to decide for yourself if it makes 
> sense from the first principles.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Ma2a9d38780ee195464364b37
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-23 Thread Rob Freeman
On Sun, Jun 23, 2024 at 11:05 PM Boris Kazachenko  wrote:
>
> There can be variance on any level of abstraction, be that between pixels or 
> between philosopical categories. And it could be in terms of any property / 
> attribute of compared elements / clusters / concepts: all these are derived 
> by lower-order comparisons.

I'm quite willing to believe there can be variance between anything at
all. But can you give me a concrete example of a "variance" that you
actually implement? One which can demonstrate its equivalence to my
sense of "contradiction" as alternate orderings of a set.

Or are you telling me you have conceptually accounted for my sense of
contradiction, simply by saying the word "variance", which includes
everything, but in practice do not implement it.

If so, since saying the word "variance" will trivially solve any
implementation, can you sketch an implementation for "variance" in my
sense of "contradiction"?

> None of that falls from the sky, other than pixels or equivalents: sensory 
> data at the limit of resolution. The rest is aquired, what we need to define 
> is the aquisition process itself: cross-comp (derivation) and clustering 
> (aggregation).

You think "cross-comp (derivation) and clustering (aggregation)" will do it?

Good. Yes. Please show me how to use "cross-comp (derivation) and
clustering (aggregation)" to implement "variance" in my sense of
"contradiction", as alternate orderings of sets.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M261ada29388968d40ef34faf
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-23 Thread Rob Freeman
There were 5 or 6 totally mis-interpretations of my words in there,
Boris. Mis-interpretations of my words was almost the whole content of
your argument. I'll limit myself to the most important
mis-interpretation below.

On Sun, Jun 23, 2024 at 7:10 PM Boris Kazachenko  wrote:
> ...
> Starting from your "contradiction": that's simply a linguistic equivalent of 
> my variance.

Is it?

What I mean by contradiction is different orderings of an entire set
of data, not points of contrast within a set of data.

E.g. if you take a group of people and order them by height you will
generally disorder them by age. But if you order them by age, you will
generally disorder them by height. The orders are (depending on
correlation of age and height) contradictory. It's impossible to order
them both at the same time.

Is that really what you mean by "variance"?

I understood your use of "variance" to mean something like edge
detection in CNNs. You write:

"... CNN ... selects for variance"

Are you now saying your "variance" is my re-ordering of sets, and not
"edge" detection or points of contrast within sets?

To clarify what you mean by "variance", can you give me a concrete
example, one which is comparable to my example of ordering people by
height or age, alternately?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Ma5a6d254cc2308c2f66ae27e
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-22 Thread Rob Freeman
On Sat, Jun 22, 2024 at 7:50 PM Boris Kazachenko  wrote:
>
> On Saturday, June 22, 2024, at 7:18 AM, Rob Freeman wrote:
>
> But I'm not sure that just sticking to some idea of learned hierarchy, which 
> is all I remember of your work, without exposing it to criticism, is 
> necessarily going to get you any further.
>
> It's perfectly exposed: https://github.com/boris-kz/CogAlg

I see. The readme seems concise. Quite a good way to expose it.

Trivial typo in that first paragraph BTW, "pattern recognition (is?) a
main focus in ML"?

So what's your idea now? I remember we talked long ago and while you
were early into networks, I couldn't convince you that the key problem
was that meaning could not be fully learned, because meaningful
patterns contradict. You were sure all that was needed was learning
patterns in hierarchy.

Where have you arrived now?

You say, "The problem I have with current ML is conceptual consistency."

By "conceptual consistency" you mean a distinction between searching
for "similarity" with "attention" in transformers, "similarity" being
co-occurrence(?), vs "variance" or edges, in CNNs.

The solution you are working on is to cluster for both "similarity"
and "variance".

FWIW I think it is a mistake to equate transformers with "attention".
Yeah, "attention" was the immediate trigger of the transformer
revolution. But it's a hack to compensate for lack of structure. The
real power of transformers is the "embedding". Embedding was just
waiting for something to liberate it from a lack of structure.
"Attention" did that partially. But it's a hack. The lack of structure
is because the interesting encoding, which is the "embedding",
attempts global optimization, when instead globally it contradicts in
context, and needs to be generated at run-time.

If you do "embedding" at run time, it can naturally involve token
sequences of different length, and embedding sequences of different
length generates a hierarchy, and gives you structure. The structure
pulls context together naturally, and "attention" as some crude dot
product for relevance, won't be necessary.

Ah... Reading on I see you do address embeddings... But you see the
problem there as being back-prop causing information loss over many
layers. So you think the solution is "lateral" clustering first. You
say "This cross-comp and clustering is recursively hierarchical". Yes,
that fits with what I'm saying. You get hierarchy from sequence
embedding.

So what's the problem with these embedding hierarchies in your model?
In mine it is that they contradict and must be found at run time. You
don't have that. Instead you go back to the combination of "similarity
and variance" idea. And you imagine there are some highly complex
"nested derivatives"...

So the contrast between us is still that you don't see that
contradictory patterns prohibit global learning.

Compared to what I remember you are addressing "lateral" patterns now.
Perhaps as a consequence of the success of transformers?

But instead of addressing the historical failure of "lateral",
embedding hierarchies as a consequence of contradictory patterns, as I
do, you imagine that the solution is some mix of this combination of
"similarity" and "variance", combined with some kind of complex
"nested derivatives".

There's a lot of complexity after that. One sentence jumps out "I see
no way evolution could  produce (the) proposed algorithm". With which
I agree. I'm not sure why you don't think that's an argument against
it. This compared with my hypothesis, which sees nested hierarchies
appearing naturally, in real time, as synchronized oscillations over
predictive symmetries in a sequence network.

Can I ask you, have you considered instead my argument, that these
"lateral" hierarchical embeddings might be context dependent, and
contradict globally, so that they can't be globally learned, and must
be generated at run time? Do you have any argument to exclude that?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mcd56e51f00e643bbf4829174
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-22 Thread Rob Freeman
On Sat, Jun 22, 2024 at 6:05 PM Boris Kazachenko  wrote:
> ...
> You both talk too much to get anything done...

Ah well, you may be getting lots done, Boris. The difference is
perhaps, I don't know everything yet.

Though, after 35 years, it can be surprising what other people don't
know. I like to help where I can. Some people just have no clue. Even
LeCun. Vision guy. He's probably been thinking about language only 7
years or so, since transformers. He only knows the mental ruts his
vision, back-prop, career, has led him to. You can be deep in one
problem and shallow in another.

But I don't know everything. Trying to explain keeps me thinking about
it. And here and there you get good new information.

For instance, that paper James introduced me to, perhaps for the wrong
reasons, was excellent new information:

A logical re-conception of neural networks: Hamiltonian bitwise
part-whole architecture E.F.W.Bowen,1 R.Granger,2* A.Rodriguez3
https://openreview.net/pdf?id=hP4dxXvvNc8

Very nice. The only other mention I recall for the open endedness of
"position-based"/"embedding" type encoding, as a key to creativity. A
nice vindication for me. Helps give me confidence I'm on the right
track. And they have some ideas for extensions to vision, etc. Though
I don't think they see the contradiction angle.

And, another example, commenting on that LeCun post (the one
mentioning the "puzzle" of transformer world models which get less
coverage as you increase resolution... A puzzle. Ha. Nice vindication
in itself...) Twitter prompted me to a guy in Australia who it turns
out has just published a paper showing that sequence networks with a
lot of shared "walk" end points, tend to synchronize locally.

Wow. A true wow. Shared endpoints constrain local synchronization. I
was wondering about that!

How shared end points could constrain sub-net synchrony in a feed
forward networks was something I was struggling with. I think I need
it. So a paper explaining that they do is well cool. New information.
It gives me confidence to move forward looking for the right kind of
feed forward net to try and get local synchronizations corresponding
to substitution groupings/embeddings. Those substitution "embeddings"
would be "walks" between such shared end points, and I want them to
synchronize.

Paper here:

Analytic relationship of relative synchronizability to network
structure and motifs
Joseph T. Lizier,1, 2, ∗ Frank Bauer,2, 3 Fatihcan M. Atay,4, 2 and
J¨urgen Jost2
https://openreview.net/pdf?id=hP4dxXvvNc8

More populist presentation shared on my FB group here:

https://www.facebook.com/share/absxV8ij9rio2j9a/
He has a github:

https://github.com/jlizier/linsync

And that guy Lizier appears to be part of a hitherto unsuspected
sub-field of neuro-computational research attempting to reconcile
synchronized oscillations to some kind of processing breakdown. None
of it from my point of view though, I think. I need to explore where
there might be points of connection there.

So, lots of opportunity to waste time, sure. But I'm not sure that
just sticking to some idea of learned hierarchy, which is all I
remember of your work, without exposing it to criticism, is
necessarily going to get you any further.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M0018a3d4180b84e0801eae92
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-22 Thread Rob Freeman
Twenkid,

Wow. Lots of words. I don't mind detail, but words are slippery.

If you actually want to do stuff, it's better to keep the words to a
minimum and start with concrete examples. At least until some minimum
of consensus is agreed.

Trying to focus on your concrete questions...

On Sat, Jun 22, 2024 at 10:16 AM twenkid  wrote:
>
> Regarding the contradictions - I guess you mean ambiguity

Yeah, I guess so. I was thinking more abstractly in terms of
grammatical classes. You can never learn grammar, because any rule you
make always ends up being violated: AB->C, **except** in certain
cases... etc. But probably ambiguity at the meaning level resolves to
the same kinds of issues, sure.

> * BTW, what is "not to contradict"? How it would look like in a particular 
> case, example?

Oh, I suppose any formal language is an example of a system that
doesn't contradict. Programming languages... Maths, once axioms are
fixed, would be another example of non-contradiction (by definition?
Of course the thing with maths is that different sets of possible
axioms contradict, and that contradiction of possible axiomatizations
is the whole deal.)

> What do you mean by "the language problem"?

"Grammar", text compression, text prediction...

> The language models lead to such an advance: compared to what else, other 
> (non-language?) models.

Advance, compared to everything before transformers.

In terms of company market cap, if you want to quibble.

> Rob: >Do you have any comments on that idea, that patterns of meaning which 
> can be learned contradict, and so have to be generated in real time?
>
> I am not sure about the proper interpretation of your use of "to contradict"; 
> words/texts have multiple meanings and language and text are lower resolution 
> than thought if they are supposed to represent the reality "exactly" lower 
> level, higher precision representations are needed as well

In terms of my analogy to maths, this reads to me like saying: the
fact there are multiple axiomatizations for maths, means maths axioms
are somehow "lower resolution", and the solution for maths is to have
"higher precision representations" for maths... :-b

If you can appreciate how nonsensical that analysis would be within
the context of maths, then you may get a read on what it sounds like
to me from the way I'm looking at language. Instead, I think different
grammaticalizations of language are like different axiomatizations of
maths (inherently random and infinite?)

You're not the only one doing that, of course. Just the other day
LeCun was tweeting something comparable in response to a study which
revealed transformer world models seem to... contradict! Contradict?!
Who'd 've thought it?! The more resolution you get, the less coverage
you get. Wow. Surprise. Gee, that must mean that we need to find a
"higher precision representation" somewhere else!

LeCun's post here:

https://x.com/ylecun/status/1803677519314407752

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M268d0affcf74d745427a406e
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-17 Thread Rob Freeman
On Mon, Jun 17, 2024 at 3:22 PM Quan Tesla  wrote:
>
> Rob, basically you're reiterating what I've been saying here all along. To 
> increase contextualization and instill robustness in the LLM systemic 
> hierarchies. Further, that it seems to be critically lacking within current 
> approaches.
>
> However, I think this is fast changing, and soon enough, I expect 
> breakthroughs in this regard. Neural linking could be one of those solutions.
>
> While it may not be exactly the same as your hypothesis (?), is it because 
> it's part of your PhD that you're not willing to acknowledge that this 
> theoretical work may have been completed by another researcher more than 17 
> years ago, even submitted for review and subsequently approved? The market, 
> especially Japan, grabbed this research as fast as they could. It's the West 
> that turned out to be all "snooty" about its meaningfulness, yet, it was the 
> West that reviewed and approved of it. Instead of serious collaboration, is 
> research not perhaps being hamstrung by the NIH (Not Invented Here) syndrome, 
> acting like a stuck handbrake?

You intrigue me. "Contextualization ... in LLM systemic hierarchies"
was completed and approved 17 years ago?

"Contextualization" is a pretty broad word. I think the fact that
Bengio retreated to distributed representation with "Neural Language
Models" around... 2003(?) might be seen as one acceptance of... if not
contextualization, at least indeterminacy (I see Bengio refers to "the
curse of dimensionality".) But I see nothing about structure until
Coecke et co. around 2007. And even they (and antecedents going back
to the early '90s with Smolensky?) I'm increasingly appreciating seem
trapped in their tensor formalisms.

The Bengio thread, if it went anywhere, it stayed stuck on structure
until deep learning rescued it with LSTM. And then "attention".

Anyway, the influence of Coecke seems to be tiny. And basically
mis-construed. I think Linas Vepstas followed it, but only saw
encouragement to seek other mathematical abstractions of grammar. And
OpenCog wasted a decade trying to learn those grammars.

Otherwise, I've been pretty clear that I think there are hints to what
I'm arguing in linguistics and maths going back decades, and in
philosophy going back centuries. The linguistics ones specifically
ignored by machine learning.

But that any of this, or anything like it was "grabbed ... as fast as
they could" by the market in Japan, is a puzzle to me (17 years ago?
Specifically 17?)

As is the idea that the West failed to use it, even having "reviewed
and approved it", because it was "snooty" about... Japan's market
having grabbed it first?

Sadly Japanese research in AI, to my knowledge, has been dead since
their big push in the 1980s. Dead, right through their "lost" economic
decades. I met the same team I knew working on symbolic machine
translation grammars 1989-91, at a conference in China in 2002, and as
far as I know they were still working on refinements to the same
symbolic grammar. 10 more years. Same team. Same tech. Just one of the
係長 had become 課長.

What is this event from 17 years ago?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M2187d1c831913c8c67e1fc9c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-06-14 Thread Rob Freeman
On Sat, Jun 15, 2024 at 1:29 AM twenkid  wrote:
>
> ...
> 2. Yes, the tokenization in current LLMs is usually "wrong", ... it  should 
> be on concepts and world models: ... it should predict the *physical* future 
> of the virtual worlds

Thanks for comments. I can see you've done a lot of thinking, and see
similarities in many places, not least Jeff Hawkins, HTM, and
Friston's Active Inference.

But I read what you are suggesting as a solution to the current
"token" problem for LLMs, like that of a lot of people currently,
LeCun prominently, to be that we need to ground representation more
deeply in the real world.

I find this immediate retreat to other sources of data kind of funny,
actually. It's like... studying the language problem has worked really
well, so the solution to move forward is to stop studying the language
problem!

We completely ignore why studying the language problem has caused such
an advance. And blindly, immediately throw away our success and look
elsewhere.

I say look more closely at the language problem. Understand why it has
caused such an advance before you look elsewhere.

I think the reason language models have led us to such an advance is
that the patterns language prompts us to learn are inherently better.
"Embeddings", gap fillers, substitution groupings, are just closer to
the way the brain works. And language has led us to them.

So OK, if "embeddings" have been the advance, replacing both fixed
labeled objects in supervised learning, and fixed objects based on
internal similarities in "unsupervised" learning, instead leading us
to open ended categories based on external relations, why do we still
have problems? Why can't we structure better than "tokens"? Why does
it seem like they've led us the other way, to no structure at all?

My thesis is actually pretty simple. It is that these open ended
categories of "embeddings" are good, but they contradict. These "open"
categories can have a whole new level of "open". They can change all
the time. That's why it seems like they've led us to no structure at
all. Actually we can have structure. It is just we have to generate it
in real time, not try to learn it all at once.

That's really all I'm saying, and my solution to the "token" problem.
It means you can start with "letter" tokens, and build "word" tokens,
and also "phrases", whole hierarchies. But you have to do it in real
time, because the "tokens", "words", "something", "anything", "any
thing", two "words", one "word"... whatever, can contradict and have
to be found always only in their relevant context.

Do you have any comments on that idea, that patterns of meaning which
can be learned contradict, and so have to be generated in real time?

I still basically see nobody addressing it in the machine learning community.

It's a little like Matt's "modeling both words and letters" comment.
But it gets beneath both. It doesn't only use letters and words, it
creates both "letters" and "words" as "fuzzy", or contradictory,
constructs in themselves. And then goes on to create higher level
structures, hierarchies, phrases, sentences, as higher "tokens",
facilitating logic, symbolism, and all those other artifacts of higher
structure which are currently eluding LLMs. All levels of structure
become accessible if we just accept they may contradict, and so have
to be generated in context, at run time.

It's also not unrelated to James' definition of "a thing as anything
that can be distinguished from something else." Though that is more at
the level of equating definition with relationship, or "embedding",
and doesn't get into the missing, contradictory, or "fuzzy" aspect.
Though it allows that fuzzy aspect to exist, and leads to it, if once
you imagine it might, because it decouples the definition of a thing,
from any single internal structure of the thing itself.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M9013929b4653571c40328f7b
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-29 Thread Rob Freeman
On Wed, May 29, 2024 at 9:37 AM Matt Mahoney  wrote:
>
> On Tue, May 28, 2024 at 7:46 AM Rob Freeman  
> wrote:
>
> > Now, let's try to get some more detail. How do compressors handle the
> > case where you get {A,C} on the basis of AB, CB, but you don't get,
> > say AX, CX? Which is to say, the rules contradict.
>
> Compressors handle contradictory predictions by averaging them

That's what I thought.

> > "Halle (1959, 1962) and especially Chomsky (1964) subjected
> > Bloomfieldian phonemics to a devastating critique."
> >
> > Generative Phonology
> > Michael Kenstowicz
> > http://lingphil.mit.edu/papers/kenstowicz/generative_phonology.pdf
> >
> > But really it's totally ignored. Machine learning does not address
> > this to my knowledge. I'd welcome references to anyone talking about
> > its relevance for machine learning.
>
> Phonology is mostly irrelevant to text prediction.

The point was it invalidated the method of learning linguistic
structure by distributional analysis at any level. If your rules for
phonemes contradict, what doesn't contradict?

Which is a pity. Because we still don't have a clue what governs
language structure. The best we've been able to come up with is crude
hacks like dragging a chunk of important context behind like a ball
and chain in LSTM, or multiplexing pre-guessed "tokens" together in a
big matrix, with "self-attention".

Anyway, your disinterest doesn't invalidate my claim that this result,
pointing to contradiction produced by distributional analysis learning
procedures for natural language, is totally ignored by current machine
learning, which implicitly or otherwise uses those distributional
analysis learning procedures.

> Language evolved to be learnable on neural networks faster than our
> brains evolved to learn language. So understanding our algorithm is
> important.
>
> Hutter prize entrants have to prebuild a lot of the model because
> computation is severely constrained (50 hours in a single thread with
> 10 GB memory). That includes a prebuilt dictionary. The human brain
> takes 20 years to learn language on a 10 petaflop, 1 petabyte neural
> network. So we are asking quite a bit.

Neural networks may have finally gained close to human performance at
prediction. A problem where you can cover a multitude of sins with raw
memory. Something at which computers trivially exceed humans by as
many orders of magnitude as you can stack server farms. You can just
remember each contradiction including the context which selects it. No
superior algorithm required, and certainly none in evidence. (Chinese
makes similar trade-offs, swapping internal mnemonic sound structure
within tokens, with prodigious memory requirements for the tokens
themselves.) Comparing 10 GB with 1 petabyte seems ingenuous. I
strongly doubt any human can recall as much as 10GB of text. (All of
Wikipedia currently ~22GB compressed, without media? Even to read it
all is estimated at 47 years, including 8hrs sleep a night
https://www.reddit.com/r/theydidthemath/comments/80fi3w/self_how_long_would_it_take_to_read_all_of/.
So forget 20 years to learn it, it would take 20 years to read all the
memory you give Prize entrants.) But I would argue our prediction
algorithms totally fail to do any sort of job with language structure.
Whereas you say babies start to structure language before they can
walk? (Walking being something else computers still have problems
with.) And far from stopping at word segmentation, babies go on to
build quite complex structures, including new ones all the time.

Current models do nothing with structure, not at human "data years"
8-10 months, not 77 years (680k hours of audio to train "Whisper" ~77
years? 
https://www.thealgorithmicbridge.com/p/8-features-make-openais-whisper-the.
Perhaps some phoneme structure might help there...) The only structure
is "tokens". I don't even think current algorithms do max entropy to
find words. They just start out with "tokens". Guessed at
pre-training. Here's Karpathy and LeCun talking about it:

Yann LeCun
@ylecun·Feb 21
Text tokenization is almost as much of an abomination for text as it
is for images. Not mentioning video.
...
Replying to @karpathy
We will see that a lot of weird behaviors and problems of LLMs
actually trace back to tokenization. We'll go through a number of
these issues, discuss why tokenization is at fault, and why someone
out there ideally finds a way to delete this stage entirely.

https://x.com/ylecun/status/1760315812345176343

By the way, talking about words. That's another thing which seems to
have contradictory structure in humans, e.g. native Chinese speakers
agree what constitutes a "word" less than 70% of the time:

"Sproat et. al. (1996) give empirical results showing that native
speaker

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-28 Thread Rob Freeman
Matt,

Nice break down. You've actually worked with language models, which
makes it easier to bring it back to concrete examples.

On Tue, May 28, 2024 at 2:36 AM Matt Mahoney  wrote:
>
> ...For grammar, AB predicts AB (n-grams),

Yes, this looks like what we call "words". Repeated structure. No
novelty. And nothing internal we can equate to "meaning" either. Only
meaning by association.

> and AB, CB, CD, predicts AD (learning the rule
> {A,C}{B,D}).

This is the interesting one. It actually kind of creates new meaning.
You can think of "meaning" as a way of grouping things which makes
good predictions. And, indeed, those gap filler sets {A,C} do pull
together sets of words that we intuitively associate with similar
meaning. These are also the sets that the HNet paper identifies as
having "meaning" independent of any fixed pattern. A pattern can be
new, and so long as it makes similar predictions {B,D}, for any set
{B,D...}, {X,Y...}..., we can think of it as having "meaning", based
on the fact that arranging the world that way, makes those shared
predictions. (Even moving beyond language, you can say the atoms of a
ball, share the meaning of a "ball", based on the fact they fly
through the air together, and bounce off walls together. It's a way of
defining what it "means" to be a "ball".)

Now, let's try to get some more detail. How do compressors handle the
case where you get {A,C} on the basis of AB, CB, but you don't get,
say AX, CX? Which is to say, the rules contradict. Sometimes A and C
are the same, but not other times. You want to trigger the "rule" so
you can capture the symmetries. But you can't make a fixed "rule",
saying {A,C}, because the symmetries only apply to particular sub-sets
of contexts.

You get a lot of this in natural language. There are many such shared
context symmetries in language, but they contradict. Or they're
"entangled". You get one by ordering contexts one way, and another by
ordering contexts another way, but you can't get both at once, because
you can't order contexts both ways at once.

I later learned these contradictions were observed even at the level
of phonemes, and this was crucial to Chomsky's argument that grammar
could not be "learned", back in the '50s. That this essentially broke
consensus in the field of linguistics. Which remains in squabbling
sub-fields over this result, to this day. That's why theoretical
linguistics contributes essentially nothing to contemporary machine
learning. Has anyone ever wondered? Why don't linguists tell us how to
build language models? Even the Chomsky hierarchy cited by James'
DeepMind paper from the "learning" point of view is essentially a
misapprehension of what Chomsky concluded (that observable grammar
contradicts, so formal grammar can't be learned.)

A reference available on the Web I've been able to find is this one:

"Halle (1959, 1962) and especially Chomsky (1964) subjected
Bloomfieldian phonemics to a devastating critique."

Generative Phonology
Michael Kenstowicz
http://lingphil.mit.edu/papers/kenstowicz/generative_phonology.pdf

But really it's totally ignored. Machine learning does not address
this to my knowledge. I'd welcome references to anyone talking about
its relevance for machine learning.

I'm sure all the compression algorithms submitted to the Hutter Prize
ignore this. Maybe I'm wrong. Have any addressed it? They probably
just regress to some optimal compromise, and don't think about it too
much.

If we choose not to ignore this, what do we do? Well, we might try to
"learn" all these contradictions, indexed on context. I think this is
what LLMs do. By accident. That was the big jump, right, "attention",
to index context. Then they just enumerate vast numbers of (an
essentially infinite number of?) predictive patterns in one enormous
training time.That's why they get so large.

No-one knows, or wonders, why neural nets work for this, and symbols
don't, viz. the topic post of this thread. But this will be the
reason.

In practice LLMs learn predictive patterns, and index them on context
using "attention", and it turns out there are a lot of those different
predictive "embeddings", indexed on context. There is no theory.
Everything is a surprise. But if you go back in the literature, there
are these results about contradictions to suggest why it might be so.
And the conclusion is still either Chomsky's one, that language can't
be learned, consistent rules exist, but must be innate. Or, what
Chomsky didn't consider, that complexity of novel patterns defying
abstraction, might be part of the solution. It was before the
discovery of chaos when Chomsky was looking at this, so perhaps it's
not fair to blame him for not considering it.

But then it becomes a complexity issue. Just how many unique orderings
of contexts with useful predictive symmetries are there? Are you ever
at an end of finding different orderings of contexts, which specify
some useful new predictive symmetry or other? The example of

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-27 Thread Rob Freeman
James,

I think you're saying:

1) Grammatical abstractions may not be real, but they can still be
useful abstractions to parameterize "learning".

2) Even if after that there are "rules of thumb" which actually govern
everything.

Well, you might say why not just learn the "rules of thumb".

But the best counter against the usefulness of the Chomsky hierarchy
for parameterizing machine learning, might be that Chomsky himself
dismissed the idea it might be learned. And his most damaging
argument? That learned categories contradict. "Objects" behave
differently in one context, from how they behave in another context.

I see it a bit like our friend the Road Runner. You can figure out a
physics for him. But sometimes that just goes haywire and contradicts
itself - bodies make holes in rocks, fly high in the sky, or stretch
wide.

All the juice is in these weird "rules of thumb".

Chomsky too failed to find consistent objects. He was supposed to push
past the highly successful learning of phoneme "objects", and find
"objects" for syntax. And he failed. And the most important reason
I've found, was that even for phonemes, learned category contradicted.

That hierarchy stuff, that wasn't supposed to appear in the data. That
could only be in our heads. Innate. Why? Well for one thing, because
the data contradicted. The "learning procedures" of the time generated
contradictory objects. This is a forgotten result. Machine learning is
still ignoring this old result from the '50s. (Fair to say the
DeepMind paper ignores it?) Chomsky insisted these contradictions
meant the "objects" must be innate. The idea cognitive objects might
be new all the time (and particularly the idea they might contradict!)
is completely orthogonal to his hierarchy (well, it might be
compatible with context sensitivity, if you accept that the real juice
is in the mechanism to implement the context sensitivity?)

If categories contradict, that is represented on the Chomsky hierarchy
how? I don't know. How would you represent contradictory categories on
the Chomsky hierarchy? A form of context sensitivity?

Actually, I think, probably, using entangled objects like quantum. Or
relation and variance based objects as in category theory.

I believe Coecke's team has been working on "learning" exactly this:

>From Conceptual Spaces to Quantum Concepts: Formalising and Learning
Structured Conceptual Models
Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljiˇc and Stephen Clark
Quantinuum
https://browse.arxiv.org/pdf/2401.08585

I'm not sure. I think the symbolica.ai people may be working on
something similar: find some level of abstraction which applies even
across varying objects (contradictions?)

For myself, in contrast to Bob Coecke, and the category theory folks,
I think it's pointless, and maybe unduly limiting, to learn this
indeterminate object formalism from data, and then collapse it into
one or other contradictory observable form, each time you observe it.
(Or seek some way you can reason with it even in indeterminate object
formulation, as with the category theory folks?) I think you might as
well collapse observable objects directly from the data.

I believe this collapse "rule of thumb", is the whole game, one shot,
no real "learning" involved.

All the Chomsky hierarchy limitations identified in the DeepMind paper
would disappear too. They are all limitations of not identifying
objects. Context coding hacks like LSTM, or "attention", introduced in
lieu of actual objects, and grammars over those objects, stemming from
the fact grammars of contradictory objects are not "learnable."

On Sun, May 26, 2024 at 11:24 PM James Bowery  wrote:
>
> It's also worth reiterating a point I made before about the confusion between 
> abstract grammar as a prior (heuristic) for grammar induction and the 
> incorporation of so-induced grammars as priors, such as in "physics informed 
> machine learning".
>
> In the case of physics informed machine learning, the language of physics is 
> incorporated into the learning algorithm.  This helps the machine learning 
> algorithm learn things about the physical world without having to re-derive 
> the body of physics knowledge.
>
> Don't confuse the two levels here:
>
> 1) My suspicion that natural language learning may benefit from prioritizing 
> HOPDA as an abstract grammar to learn something about natural languages -- 
> such as their grammars.
>
> 2) My suspicion (supported by "X informed machine learning" exemplified by 
> the aforelinked work) that there may be prior knowledge about natural 
> language more specific than the level of abstract grammar -- such as specific 
> rules of thumb for, say, the English language that may greatly speed training 
> time on English corpora.
>
> On Sun, May 26, 2024 at 9:40 AM James Bowery  wrote:
>>
>> See the recent DeepMind paper "Neural Networks and the Chomsky Hierarchy" 
>> for the sense of "grammar" I'm using when talking about the HNet paper's 
>> connection to 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-25 Thread Rob Freeman
Thanks Matt.

The funny thing is though, as I recall, finding semantic primitives
was the stated goal of Marcus Hutter when he instigated his prize.

That's fine. A negative experimental result is still a result.

I really want to emphasize that this is a solution, not a problem, though.

As the HNet paper argued, using relational categories, like language
embeddings, decouples category from pattern. It means we can have
categories, grammar "objects" even, it is just that they may
constantly be new. And being constantly new, they can't be finitely
"learned".

LLMs may have been failing to reveal structure, because there is too
much of it, an infinity, and it's all tangled up together.

We might pick it apart, and have language models which expose rational
structure, the Holy Grail of a neuro-symbolic reconciliation, if we
just embrace the constant novelty, and seek it as some kind of
instantaneous energy collapse in the relational structure of the data.
Either using a formal "Hamiltonian", or, as I suggest, finding
prediction symmetries in a network of language sequences, by
synchronizing oscillations or spikes.

On Sat, May 25, 2024 at 11:33 PM Matt Mahoney  wrote:
>
> I agree. The top ranked text compressors don't model grammar at all.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Meac024d4e635bb1d9e8f34e9
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-24 Thread Rob Freeman
sing on relations defining objects
in ways which allow their internal "pattern" to vary.

That's what I see being presented in the HNet paper. Maybe I'm getting
ahead of its authors. Because that is the solution I'm presenting
myself. But I interpret the HNet paper to present that option also.
Cognitive objects, including "grammar", can emerge with a freedom
which resembles the LLM freedom of totally ignoring "objects" (which
seems to be necessary, both by the success of LLMs at generating text,
and by the observed failure of formal grammars historically) if you
specify them in terms of external relations.

Maybe the paper authors don't see it. But the way they talk about
generating grammars based on external relations, opens the door to it.

On Fri, May 24, 2024 at 10:12 PM James Bowery  wrote:
>
>
>
> On Thu, May 23, 2024 at 9:19 PM Rob Freeman  
> wrote:
>>
>> ...(Regarding the HNet paper)
>> The ideas of relational category in that paper might really shift the
>> needle for current language models.
>>
>> That as distinct from the older "grammar of mammalian brain capacity"
>> paper, which I frankly think is likely a dead end.
>
>
> Quoting the HNet paper:
>>
>> We conjecture that ongoing hierarchical construction of
>> such entities can enable increasingly “symbol-like” repre-
>> sentations, arising from lower-level “statistic-like” repre-
>> sentations. Figure 9 illustrates construction of simple “face”
>> configuration representations, from exemplars constructed
>> within the CLEVR system consisting of very simple eyes,
>> nose, mouth features. Categories (¢) and sequential rela-
>> tions ($) exhibit full compositionality into sequential rela-
>> tions of categories of sequential relations, etc.; these define
>> formal grammars (Rodriguez & Granger 2016; Granger
>> 2020). Exemplars (a,b) and near misses (c,d) are presented,
>> initially yielding just instances, which are then greatly re-
>> duced via abductive steps (see Supplemental Figure 13).
>
> Artificial General Intelligence List / AGI / see discussions + participants + 
> delivery options Permalink

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M9f8daceca7b091a0b823481d
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-23 Thread Rob Freeman
James,

Not sure whether all that means you think category theory might be
useful for AI or not.

Anyway, I was moved to post those examples by Rich Hickey and Bartoz
Milewsky in my first post to this thread, by your comment that ideas
of indeterminate categories might annoy what you called 'the risible
tradition of so-called "type theories" in both mathematics and
programming languages'. I see the Hickey and Milewsky refs as examples
of ideas of indeterminate category entering computer programming
theory too.

Whether posted on the basis of a spurious connection or not, thanks
for the Granger HNet paper. That's maybe the most interesting paper
I've seen this year. As I say, it's the only reference I've seen other
than my own presenting the idea that relational categories liberate
category from any given pattern instantiating it. Which I see as
distinct from regression.

The ideas of relational category in that paper might really shift the
needle for current language models.

That as distinct from the older "grammar of mammalian brain capacity"
paper, which I frankly think is likely a dead end.

Real time "energy relaxation" finding new relational categories, as in
the Hamiltonian Net paper, is what I am pushing for. I see current
LLMs as incorporating a lot of that power by accident. But because
they still concentrate on the patterns, and not the relational
generating procedure, they do it only by becoming "large". We need to
understand the (relational) theory behind it in order to jump out of
the current LLM "local minimum".

On Thu, May 23, 2024 at 11:47 PM James Bowery  wrote:
>
>
> On Wed, May 22, 2024 at 10:34 PM Rob Freeman  
> wrote:
>>
>> On Wed, May 22, 2024 at 10:02 PM James Bowery  wrote:
>> > ...
>> > You correctly perceive that the symbolic regression presentation is not to 
>> > the point regarding the HNet paper.  A big failing of the symbolic 
>> > regression world is the same as it is in the rest of computerdom:  Failure 
>> > to recognize that functions are degenerate relations and you had damn well 
>> > better have thought about why you are degenerating when you do so.  But 
>> > likewise, when you are speaking about second-order theories (as opposed to 
>> > first-order theories), such as Category Theory, you had damn well have 
>> > thought about why you are specializing second-order predicate calculus 
>> > when you do so.
>> >
>> > Not being familiar with Category Theory I'm in no position to critique 
>> > this decision to specialize second-order predicate calculus.  I just 
>> > haven't seen Category Theory presented as a second-order theory.  Perhaps 
>> > I could understand Category Theory thence where the enthusiasm for 
>> > Category Theory comes from if someone did so.
>> >
>> > This is very much like my problem with the enthusiasm for type theories in 
>> > general.
>>
>> You seem to have an objection to second order predicate calculus.
>
>
> On the contrary; I see second order predicate calculus as foundational to any 
> attempt to deal with process which, in the classical case, is computation.
>
>> Dismissing category theory because you equate it to that. On what
>> basis do you equate them? Why do you reject second order predicate
>> calculus?
>
>
> I don't "dismiss" category theory.  It's just that I've never seen a category 
> theorist describe it as a second order theory.   Even in type theories 
> covering computation one finds such phenomena as the Wikipedia article on 
> "Type theory as a logic" lacking any reference to "second order".
>
> If I appear to "equate" category theory and second order predicate calculus 
> it is because category theory is a second order theory.  But beyond that, I 
> have an agenda related to Tom Etter's attempt to flesh out his theory of 
> "mind and matter" which I touched on in my first response to this thread 
> about fixing quantum logic.  An aspect of this project is the proof that 
> identity theory belongs to logic in the form of relative identity theory.  My 
> conjecture is that it ends up belonging to second order logic (predicate 
> calculus), which is why I resorted to Isabelle (HOL proof assistant).
>
>> What I like about category theory (as well as quantum formulations) is
>> that I see it as a movement away from definitions in terms of what
>> things are, and towards definitions in terms of how things are
>> related. Which fits with my observations of variation in objects
>> (grammar initially) defying definition, but being accessible to
>> definition in terms of relations.
>
>
> On 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-22 Thread Rob Freeman
On Wed, May 22, 2024 at 10:02 PM James Bowery  wrote:
> ...
> You correctly perceive that the symbolic regression presentation is not to 
> the point regarding the HNet paper.  A big failing of the symbolic regression 
> world is the same as it is in the rest of computerdom:  Failure to recognize 
> that functions are degenerate relations and you had damn well better have 
> thought about why you are degenerating when you do so.  But likewise, when 
> you are speaking about second-order theories (as opposed to first-order 
> theories), such as Category Theory, you had damn well have thought about why 
> you are specializing second-order predicate calculus when you do so.
>
> Not being familiar with Category Theory I'm in no position to critique this 
> decision to specialize second-order predicate calculus.  I just haven't seen 
> Category Theory presented as a second-order theory.  Perhaps I could 
> understand Category Theory thence where the enthusiasm for Category Theory 
> comes from if someone did so.
>
> This is very much like my problem with the enthusiasm for type theories in 
> general.

You seem to have an objection to second order predicate calculus.
Dismissing category theory because you equate it to that. On what
basis do you equate them? Why do you reject second order predicate
calculus?

What I like about category theory (as well as quantum formulations) is
that I see it as a movement away from definitions in terms of what
things are, and towards definitions in terms of how things are
related. Which fits with my observations of variation in objects
(grammar initially) defying definition, but being accessible to
definition in terms of relations.

> But I should also state that my motivation for investigating Granger et al's 
> approach to ML is based not the fact that it focuses on abduced relations -- 
> but on its basis in "The grammar of mammalian brain capacity" being a 
> neglected order of grammar in the Chomsky Hierarchy: High Order Push Down 
> Automata.  The fact that the HNet paper is about abduced relations was one of 
> those serendipities that the prospector in me sees as a of gold in them thar 
> HOPDAs.

Where does the Granger Hamiltonian net paper mention "The grammar of
mammalian brain capacity"? If it's not mentioned, how do you think
they imply it?

> To wrap up, your definition of "regression" seems to differ from mine in the 
> sense that, to me, "regression" is synonymous with data-driven modeling which 
> is that aspect of learning, including machine learning, concerned with what 
> IS as opposed to what OUGHT to be the case.

The only time that paper mentions regression seems to indicate that
they are also making a distinction between their relational encoding
and regression:

'LLMs ... introduce sequential information supplementing the standard
classification-based “isa” relation, although much of the information
is learned via regression, and remains difficult to inspect or
explain'

How do you relate their relational encoding to regression?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M2f9210fa34834e5bb8e46d0c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-22 Thread Rob Freeman
On Thu, May 23, 2024 at 10:10 AM Quan Tesla  wrote:
>
> The paper is specific to a novel and quantitative approach and method for 
> association in general and specifically.

John was talking about the presentation James linked, not the paper,
Quan. He may be right that in that presentation they use morphisms etc
to map learned knowledge from one domain to another.

He's not criticising the paper though. Only the presentation. And the
two were discussing different techniques. John isn't criticising the
Granger et al. "relational encoding" paper at all.

> The persistence that pattern should be somehow decoupled doesn't make much 
> sense to me. Information itself is as a result of pattern. Pattern is 
> everything. Light itself is a pattern, so are the four forces. Ergo.  I 
> suppose, it depends on how you view it.

If you're questioning my point, it is that definition in terms of
relations means the pattern can vary. It's like the gap filler example
in the paper:

"If John kissed Mary, Bill kissed Mary, and Hal kissed Mary, etc.,
then a novel category ¢X can be abduced such that ¢X kissed Mary.
Importantly, the new entity ¢X is not a category based on the features
of the members of the category, let alone the similarity of such
features. I.e., it is not a statistical cluster in any usual sense.
Rather, it is a “position-based category,” signifying entities that
stand in a fixed relation with other entities. John, Bill, Hal may not
resemble each other in any way, other than being entities that all
kissed Mary. Position based categories (PBCs) thus fundamentally
differ from “isa” categories, which can be similarity-based (in
unsupervised systems) or outcome-based (in supervised systems)."

If you define your category on the basis of kissing Mary, then who's
to say that you might not find other people who have kissed Mary, and
change your category from moment to moment. As you discovered clusters
of former lovers by fits and starts, the actual pattern of your
"category" might change dramatically. But it would still be defined by
its defining relation of having kissed Mary.

That might also talk to the "regression" distinction. Or
characterizing the system, or indeed all cognition, as "learning"
period. It elides both "similarity-based" unsupervised, and
supervised, "learning". The category can in fact grow as you "learn"
of new lovers. A process which I also have difficulty equating with
regression.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M8c58bf8eb0a279da79ea34eb
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-21 Thread Rob Freeman
James,

The Hamiltonian paper was nice for identifying gap filler tasks as
decoupling meaning from pattern: "not a category based on the features
of the members of the category, let alone the similarity of such
features".

Here, for anyone else:

A logical re-conception of neural networks: Hamiltonian bitwise
part-whole architecture
E.F.W.Bowen,1 R.Granger,2* A.Rodriguez
https://openreview.net/pdf?id=hP4dxXvvNc8

"Part-whole architecture". A new thing. Though they 'share some
characteristics with “embeddings” in transformer architectures'.

So it's a possible alternate reason for the surprise success of
transformers. That's good. The field blunders about surprising itself.
But there's no theory behind it. Transformers just stumbled into
embedding representations because they looked at language. We need to
start thinking about why these things work. Instead of just blithely
talking about the miracle of more data. Disingenuously scaring the
world with idiotic fears about "more data" becoming conscious by
accident. Or insisting like LeCun that the secret is different data.

But I think you're missing the point of that Hamiltonian paper if you
think this decoupling of meaning from pattern is regression. I think
the point of this, and also the category theoretic representations of
Symbolica, and also quantum mechanical formalizations, is
indeterminate symbolization, even novelty.

Yeah, maybe regression will work for some things. But that ain't
language. And it ain't cognition. They are more aligned with a
different "New Kind of Science", that touted by Wolfram, new
structure, all the time. Not regression, going backward, but novelty,
creativity.

In my understanding the point with the Hamiltonian paper is that a
"position-based encoding" decouples meaning from any given pattern
which instantiates it.

Whereas the NN presentation is talking about NNs regressing to fixed
encodings. Not about an operator which "calculates energies" in real
time.

Unless I've missed something in that presentation. Is there anywhere
in the hour long presentation where they address a decoupling of
category from pattern, and the implications of this for novelty of
structure?

On Tue, May 21, 2024 at 11:36 PM James Bowery  wrote:
>
> Symbolic Regression is starting to catch on but, as usual, people aren't 
> using the Algorithmic Information Criterion so they end up with unprincipled 
> choices on the Pareto frontier between residuals and model complexity if not 
> unprincipled choices about how to weight the complexity of various "nodes" in 
> the model's "expression".
>
> https://youtu.be/fk2r8y5TfNY
>
> A node's complexity is how much machine language code it takes to implement 
> it on a CPU-only implementation.  Error residuals are program literals aka 
> "constants".
>
> I don't know how many times I'm going to have to point this out to people 
> before it gets through to them (probably well beyond the time maggots have 
> forgotten what I tasted like) .

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M8418e9bd5e49f7ca08dfb816
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread Rob Freeman
"Importantly, the new entity ¢X is not a category based on the
features of the members of the category, let alone the similarity of
such features"

Oh, nice. I hadn't seen anyone else making that point. This paper 2023?

That's what I was saying. Nice. A vindication. Such categories
decouple the pattern itself from the category.

But I'm astonished they don't cite Coecke, as the obvious quantum
formulation precedent (though I noticed it for language in the '90s.)

I wonder how their formulation relates to what Symbolica are doing
with their category theoretic formulations:

https://youtu.be/rie-9AEhYdY?si=9RUB3O_8WeFSU3ni

I haven't read closely enough to know if they make that decoupling of
category from pattern a sense for "creativity" the way I'm suggesting.
Perhaps that's because a Hamiltonian formulation is still too trapped
in symbolism. We need to remain trapped in the symbolism for physics.
Because for physics we don't have access to an underlying reality.
That's where AI, and particularly language, has an advantage. Because,
especially for language, the underlying reality of text is the only
reality we do have access to (though Chomsky tried to swap that
around, and insist we only access our cognitive insight.)

For AI, and especially for language, we have the opportunity to get
under even a quantum formalism. It will be there implicitly, but
instead of laboriously formulating it, and then collapsing it at run
time, we can simply "collapse" structure directly from observation.
But that "collapse" must be flexible, and allow different structures
to arise from different symmetries found in the data from moment to
moment. So it requires the abandonment of back-prop.

In theory it is easy though. Everything can remain much as it is for
LLMs. Only, instead of trying to "learn" stable patterns using
back-prop, we must "collapse" different symmetries in the data in
response to a different "prompt", at run time.

On Tue, May 21, 2024 at 5:01 AM James Bowery  wrote:
>
> From A logical re-conception of neural networks: Hamiltonian bitwise 
> part-whole architecture
>
>> From hierarchical statistics to abduced symbols
>> It is perhaps useful to envision some of the ongoing devel-
>> opments that are arising from enlarging and elaborating the
>> Hamiltonian logic net architecture. As yet, no large-scale
>> training whatsoever has gone into the present minimal HNet
>> model; thus far it is solely implemented at a small, introduc-
>> tory scale, as an experimental new approach to representa-
>> tions. It is conjectured that with large-scale training, hierar-
>> chical constructs would be accreted as in large deep network
>> systems, with the key difference that, in HNets, such con-
>> structs would have relational properties beyond the “isa”
>> (category) relation, as discussed earlier.
>> Such relational representations lend themselves to abduc-
>> tive steps (McDermott 1987) (or “retroductive” (Pierce
>> 1883)); i.e., inferential generalization steps that go beyond
>> warranted statistical information. If John kissed Mary, Bill
>> kissed Mary, and Hal kissed Mary, etc., then a novel cate-
>> gory ¢X can be abduced such that ¢X kissed Mary.
>> Importantly, the new entity ¢X is not a category based on
>> the features of the members of the category, let alone the
>> similarity of such features. I.e., it is not a statistical cluster
>> in any usual sense. Rather, it is a “position-based category,”
>> signifying entities that stand in a fixed relation with other
>> entities. John, Bill, Hal may not resemble each other in any
>> way, other than being entities that all kissed Mary. Position-
>> based categories (PBCs) thus fundamentally differ from
>> “isa” categories, which can be similarity-based (in unsuper-
>> vised systems) or outcome-based (in supervised systems).
>> PBCs share some characteristics with “embeddings” in
>> transformer architectures.
>> Abducing a category of this kind often entails overgener-
>> alization, and subsequent learning may require learned ex-
>> ceptions to the overgeneralization. (Verb past tenses typi-
>> cally are formed by appending “-ed”, and a language learner
>> may initially overgeneralize to “runned” and “gived,” neces-
>> sitating subsequent exception learning of “ran” and “gave”.)
>
>
> The abduced "category" ¢X bears some resemblance to the way Currying (as in 
> combinator calculus) binds a parameter of a symbol to define a new symbol.  
> In practice it only makes sense to bother creating this new symbol if it, in 
> concert with all other symbols, compresses the data in evidence.  (As for 
> "overgeneralization", that applies to any error in prediction encountered 
> during learning and, in the ideal compressor, increases the algorithm's 
> length even if only by appending the exceptional data in a conditional -- NOT 
> "falsifying" anything as would that rascal Popper).
>
> This is "related" to quantum-logic in the sense that Tom Etter calls out in 
> the linked 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread Rob Freeman
Well, I don't know number theory well, but what axiomatization of
maths are you basing the predictions in your series on?

I have a hunch the distinction I am making is similar to a distinction
about the choice of axiomatization. Which will be random. (The
randomness demonstrated by Goedel's diagonalization lemma? "True" but
not provable/predictable within the system?)

On Mon, May 20, 2024 at 9:09 PM James Bowery  wrote:
>
>
>
> On Sun, May 19, 2024 at 11:32 PM Rob Freeman  
> wrote:
>>
>> James,
>>
>> My working definition of "truth" is a pattern that predicts. And I'm
>> tending away from compression for that.
>
>
> 2, 4, 6, 8
>
> does it mean
> 2n?
>
> or does it mean
> 10?
>
>
>
>> Related to your sense of "meaning" in (Algorithmic Information)
>> randomness. But perhaps not quite the same thing.
>
>
> or does it mean a probability distribution of formulae that all produce 2, 4, 
> 6, 8 whatever they may subsequently produce?
>
> or does it mean a probability distribution of sequences
> 10, 12?
> 10, 12, 14?
> 10, 13, 14?
> ...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M086013ed4b196bdfe9a874c8
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-19 Thread Rob Freeman
James,

My working definition of "truth" is a pattern that predicts. And I'm
tending away from compression for that.

Related to your sense of "meaning" in (Algorithmic Information)
randomness. But perhaps not quite the same thing.

I want to emphasise a sense in which "meaning" is an expansion of the
world, not a compression. By expansion I mean more than one,
contradictory, predictive pattern from a single set of data.

Note I'm saying a predictive pattern, not a predictable pattern.
(Perhaps as a random distribution of billiard balls might predict the
evolution of the table, without being predictable itself?)

There's randomness at the heart of that. Contradictory patterns
require randomness. A single, predictable, pattern, could not have
contradictory predictive patterns either? But I see the meaning coming
from the prediction, not any random pattern that may be making the
prediction.

Making meaning about prediction, and not any specific pattern itself,
opens the door to patterns which are meaningful even though new. Which
can be a sense for creativity.

Anyway, the "creative" aspect of it would explain why LLMs get so big,
and don't show any interpretable structure.

With a nod to the topic of this thread, it would also explain why
symbolic systems would never be adequate. It would undermine the idea
of stable symbols, anyway.

So, not consensus through a single, stable, Algorithmic Information
most compressed pattern, as I understand you are suggesting (the most
compressed pattern not knowable anyway?) Though dependent on
randomness, and consistent with your statement that "truth" should be
"relative to a given set of observations".

On Sat, May 18, 2024 at 11:57 PM James Bowery  wrote:
>
> Rob, the problem I have with things like "type theory" and "category theory" 
> is that they almost always elide their foundation in HOL (high order logic) 
> which means they don't really admit that they are syntactic sugars for 
> second-order predicate calculus.  The reason I describe this as "risible" is 
> the same reason I rather insist on the Algorithmic Information Criterion for 
> model selection in the natural sciences:
>
> Reduce the argument surface that has us all going into hysterics over "truth" 
> aka "the science" aka what IS the case as opposed to what OUGHT to be the 
> case.
>
> Note I said "reduce" rather than "eliminate" the argument surface.  All I'm 
> trying to do is get people to recognize that relative to a given set of 
> observations the Algorithmic Information Criterion is the best operational 
> definition of the truth.
>
> It's really hard for people to take even this baby step toward standing down 
> from killing each other in a rhyme with The Thirty Years War, given that 
> social policy is so centralized that everyone must become a de facto 
> theocratic supremacist as a matter of self defence.  It's really obvious that 
> the trend is toward capturing us in a control system, e.g. a Valley-Girl 
> flirtation friendly interface to Silicon Chutulu that can only be fought at 
> the physical level such as sniper bullets through the cooling systems of data 
> centers.  This would probably take down civilization itself given the 
> over-emphasis on efficiency vs resilience in civilization's dependence on 
> information systems infrastructure.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M8a84fef3037323602ea7dcca
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread Rob Freeman
James,

For relevance to type theories in programming I like Bartosz
Milewski's take on it here. An entire lecture series, but the part
that resonates with me is in the introductory lecture:

"maybe composability is not a property of nature"

Cued up here:

Category Theory 1.1: Motivation and Philosophy
Bartosz Milewski
https://youtu.be/I8LbkfSSR58?si=nAPc1f0unpj8i2JT=2734

Also Rich Hickey, the creator of Clojure language, had some nice
interpretations in some of his lectures, where he argued for the
advantages of functional languages over object oriented languages.
Basically because, in my interpretation, the "objects" can only ever
be partially "true".

Maybe summarized well here:

https://twobithistory.org/2019/01/31/simula.html

Or here:

https://www.flyingmachinestudios.com/programming/the-unofficial-guide-to-rich-hickeys-brain/

Anyway, the code guys are starting to notice it too.

-Rob

On Fri, May 17, 2024 at 7:25 AM James Bowery  wrote:
>
> First, fix quantum logic:
>
> https://web.archive.org/web/20061030044246/http://www.boundaryinstitute.org/articles/Dynamical_Markov.pdf
>
> Then realize that empirically true cases can occur not only in multiplicity 
> (OR), but with structure that includes the simultaneous (AND) measurement 
> dimensions of those cases.
>
> But don't tell anyone because it might obviate the risible tradition of 
> so-called "type theories" in both mathematics and programming languages 
> (including SQL and all those "fuzzy logic" kludges) and people would get 
> really pissy at you.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mea3f554271a532a282d58fa0
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] α, αGproton, Combinatorial Hierarchy, Computational Irreducibility and other things that just don't matter to reaching AGI

2024-05-10 Thread Rob Freeman
In the corporate training domain, you must have come across Edward de
Bono? I recall he also focuses on discontinuous change and novelty.

Certainly I would say there is broad scope for the application of,
broadly quantum flavoured, AI based insights about meaning in broader
society. Not just project management. But not knowing how your
"Essence" works, I can't comment how much that coincides with what I
see.

There's a lot of woo woo which surrounds quantum, so I try to use
analogies sparingly. But for ways to present it, you might look at Bob
Coecke's books. I believe he has invented a whole visual,
diagrammatic, system for talking about quantum systems. He is proud of
having used it to teach high school students. The best reference for
that might be his book "Picturing Quantum Processes".

Thanks for your interest in reading more about the solutions I see. I
guess I've been lazy in not putting out more formal presentations.
Most of what I have written has been fairly technical, and directed at
language modeling.

The best non-technical summary might be an essay I posted on substack, end '22:

https://robertjohnfreeman.substack.com/p/essay-response-to-question-which

That touches briefly on the broader social implications of subjective
truth, and how a subjective truth which is emergent of objective
structural principles, might provide a new objective social consensus.

On quantum indeterminacy emerging from the complexity of combinations
of perfectly classical and observable elements, I tried to present
myself in contrast to Bob Coecke's top-down quantum grammar approach,
on the Entangled Things podcast:

https://www.entangledthings.com/entangled-things-rob-freeman

You could look at my Facebook group, Oscillating Networks for AI.
Check out my Twitter, @rob_freeman.

Technically, the best summary is probably still my AGI-21
presentation. Here's the workshop version of that, with discussion at
the end:

https://www.youtube.com/watch?v=YiVet-b-NM8

On Fri, May 10, 2024 at 9:18 PM Quan Tesla  wrote:
>
> Rob.
>
> Thank you for being candid. My verbage isn't deliberate. I don't seek 
> traction, or funding for what I do. There's no real justification for your 
> mistrust.
>
> Perhaps, let me provide some professional background instead. As an 
> independent researcher, I follow scientific developments among multiple 
> domains, seeking coherence and sense-making for my own scientific endeavor, 
> spanning 25 years. AGI has been a keen interest of mine since 2013. For AGI, 
> I advocate pure machine consciousness, shying away from biotech approaches.
>
> My field of research interest stems from a previous career in cross-cultural 
> training, and the many challenges it presented in the 80's. As 
> designer/administrator/manager and trainer, one could say I fell in love with 
> optimal learning methodologies and associated technologies.
>
> Changing careers, I started in mainframe operating to advance to programming, 
> systems analysis and design, information and business engineering and 
> ultimately contracting consultant. My one, consistent research area remained 
> knowledge engineering, especialky tacit-knowledge engineering. Today, I 
> promote the idea for a campus specializing in quantum systems engineering. 
> I'm generally regarded as being a pracademic of sorts.
>
> Like many of us practitioners here, I too was fortunate to learn with a 
> number of founders and world-class methodologists.
>
> In 1998, my job in banking was researcher/architect to the board of a 5-bank 
> merger, today part of the Barclays Group. As futurist architect and peer 
> reviewer, I was introduced to quantum physics. Specifically, in context of 
> the discovery of the quark.
>
> I realized that future, exponential complexity was approaching, especially 
> for knowledge organizations. I researched possible solutions worldwide, but 
> found none at that time, which concerned me deeply.
>
> Industries seemed to be rushing into the digital revolution without a 
> rekiable, methodological management foundation in place. As architect, I had 
> nothing to offer as a useful, 10-year futures outlook either. I didn't feel 
> competent to be the person to address that apparent gap.
>
> A good colleague of mine was a proven IE methodologist and consultant to IBM 
> Head Office. I approached him twice with my concerns, asking him to adapt his 
> proven IE methodogy to address the advancing future. He didn't take my 
> concerns seriously at all.
>
> For the next year, the future seemed ever-more clearer to me, yet I couldn't 
> find anyone to develop a future aid for enterprises as a roadmap toolkit, or 
> a coping mechanism for a complex-adaptive reality.  The world was hung up on 
> UML and Object oriented technologies.
>
> In desperation, I decided h

Re: [agi] α, αGproton, Combinatorial Hierarchy, Computational Irreducibility and other things that just don't matter to reaching AGI

2024-05-09 Thread Rob Freeman
Quan. You may be talking sense, but you've got to tone down the
buzzwords by a whole bunch. It's suspicious when you jam so many in
together.

If you think there's a solution there, what are you doing about it in practice?

Be more specific. For instance, within the span of what I understand
here I might guess at relevance for Coecke's "Togetherness":

>From quantum foundations via natural language meaning to a theory of everything
https://arxiv.org/pdf/1602.07618.pdf

Or Tomas Mikolov's (key instigator of word2vec?) attempts to get
funding to explore evolutionary computational automata.

Tomas Mikolov - "We can design systems where complexity seems to be
growing" (Another one from AGI-21. It can be hard to motivate yourself
to listen to a whole conference, but when you pay attention, there can
be interesting stuff on the margins.)
https://youtu.be/CnsqHSCBgX0?t=10859

There's also an Artificial Life, ALife, community. Which seems to be
quite big in Japan. A group down in Okinawa under Tom Froese, anyway.
(Though they seem to go right off the edge and focus on some kind of
community consciousness.) But also in the ALife category I think of
Bert Chan, recently moved to Google(?).

https://biofish.medium.com/lenia-beyond-the-game-of-life-344847b10a72

All of that. And what Dreyfus called Heideggerian AI. Associated with
Rodney Brooks, and his "Fast, Cheap, and Out of Control", Artificial
Organism bots. It had a time in Europe especially, Luc Steels, Rolf
Pfeifer? The recently lost Daniel Dennett.

Why Heideggerian AI failed and how fixing it would require making it
more Heideggerian☆
Hubert L.Dreyfus
https://cid.nada.kth.se/en/HeideggerianAI.pdf

How would you relate what you are saying to all of these?

I'm sympathetic to them all. Though I think they miss the insight of
predictive symmetries. Which language drives you to. And what LLMs
stumbled on too. And that's held them up. Held them up for 30 years or
more.

ALife had a spike around 1995. Likely influencing Ben and his Chaotic
Logic book, too. They had the complex system idea back then, they just
didn't have a generative principle to bring it all together.

Meanwhile LLMs have kind of stumbled on the generative principle.
Though they remain stuck in the back-prop paradigm, and unable to
fully embrace the complexity.

I put myself in the context of all those threads. Though I kind of
worked back to them, starting with the language problem, and finding
the complexity as I went. As I say, language drives you to deal with
predictive symmetries. I think ALife has stalled for 30 years because
it hasn't had a central generative principle. What James might call a
"prior". Language offers a "prior" (predictive symmetries.) Combine
that with ALife complex systems, and you start to get something.

But that's to go off on my own tangent again.

Anyway, if you can be more specific, or put what you're saying in the
context of something someone else is doing, you might get more
traction.

On Thu, May 9, 2024 at 3:10 PM Quan Tesla  wrote:
>
> Rob, not butting in, but rather adding to what you said (see quotation below).
>
> The conviction across industries that hierachy (systems robustness) persist 
> only in descending and/or ascending structures, though true, can be proven to 
> be somewhat incomplete.
>
> There's another computational way to derive systems-control hierarchy(ies) 
> from. This is the quantum-engineering way (referred to before), where 
> hierachy lies hidden within contextual abstraction, identified via case-based 
> decision making and represented via compound functionality outcomes. 
> Hierarchy as a centre-outwards, in the sense of emergent, essential 
> characteristic of a scalable system. Not deterministically specified.
>
> In an evolutionary sense, hierarchies are N-nestable and self discoverable. 
> With the addition of integrated vectors, knowledge graphs may also be 
> derived, instead of crafted.
>
> Here, I'm referring to 2 systems hierarchies in particular. 'A', a hierarchy 
> of criticality (aka constraints) and 'B', a hierarchy of priority (aka 
> systemic order).
>
> Over the lifecycles of a growing system, as it mutates and evolve in 
> relevance (optimal semantics), hierarchy would start resembling - without 
> compromising -NNs and LLMs.
>
> Yes, a more-holistic envelope then, a new, quantum reality, where 
> fully-recursive functionality wasn't only guaranteed, but correlation and 
> association became foundational, architectural principles.
>
> This is the future of quantum systems engineering, which I believe quantum 
> computing would eventually lead all researchers to. Frankly, without it, 
> we'll remain stuck in the quagmire of early 1990s+ functional 
> analysis-paralysis, by any name.
>
> I'll hold out hope for that one, enlightened developer to make that quant

Re: [agi] α, αGproton, Combinatorial Hierarchy, Computational Irreducibility and other things that just don't matter to reaching AGI

2024-05-09 Thread Rob Freeman
On Thu, May 9, 2024 at 6:15 AM James Bowery  wrote:
>
> Shifting this thread to a more appropriate topic.
>
> -- Forwarded message -
>>
>> From: Rob Freeman 
>> Date: Tue, May 7, 2024 at 8:33 PM
>> Subject: Re: [agi] Hey, looks like the goertzel is hiring...
>> To: AGI 
>
>
>> I'm disappointed you don't address my points James. You just double
>> down that there needs to be some framework for learning, and that
>> nested stacks might be one such constraint.
> ...
>> Well, maybe for language a) we can't find top down heuristics which
>> work well enough and b) we don't need to, because for language a
>> combinatorial basis is actually sitting right there for us, manifest,
>> in (sequences of) text.
>
>
> The origin of the Combinatorial Hierarchy thence ANPA was the Cambridge 
> Language Research Unit.

Interesting tip about the Cambridge Language Research Unit. Inspired
by Wittgenstein?

But this history means what?

> PS:  I know I've disappointed you yet again for not engaging directly your 
> line of inquiry.  Just be assured that my failure to do so is not because I 
> in any way discount what you are doing -- hence I'm not "doubling down" on 
> some opposing line of thought -- I'm just not prepared to defend Granger's 
> work as much as I am prepared to encourage you to take up your line of 
> thought directly with him and his school of thought.

Well, yes.

Thanks for the link to Granger's work. It looks like he did a lot on
brain biology, and developed a hypothesis that the biology of the
brain split into different regions is consistent with aspects of
language suggesting limits on nested hierarchy.

But I don't see it engages in any way with the original point I made
(in response to Matt's synopsis of OpenCog language understanding.)
That OpenCog language processing didn't fail because it didn't do
language learning (or even because it didn't attempt "semantic"
learning first.) That it was somewhat the opposite. That OpenCog
language failed because it did attempt to find an abstract grammar.
And LLMs succeed to the extent they do because they abandon a search
for abstract grammar, and just focus on prediction.

That's just my take on the OpenCog (and LLM) language situation.
People can take it or leave it.

Criticisms are welcome. But just saying, oh, but hey look at my idea
instead... Well, it might be good for people who are really puzzled
and looking for new ideas.

I guess it's a problem for AI research in general that people rarely
attempt to engage with other people's ideas. They all just assert
their own ideas. Like Matt's reply to the above... "Oh no, the real
problem was they didn't try to learn semantics..."

If you think OpenCog language failed instead because it didn't attempt
to learn grammar as nested stacks, OK, that's your idea. Good luck
trying to learn abstract grammar as nested stacks.

Actual progress in the field stumbles along by fits and starts. What's
happened in 30 years? Nothing much. A retreat to statistical
uncertainty about grammar in the '90s with HMMs? A first retreat to
indeterminacy. Then, what, 8 years ago the surprise success of
transformers, a cross-product of embedding vectors which ignores
structure and focuses on prediction. Why did it succeed? You, because
transformers somehow advance the nested stack idea? Matt, because
transformers somehow advance the semantics first idea?

My idea is that they advance the idea that a search for an abstract
grammar is flawed (in practice if not in theory.)

My idea is consistent with the ongoing success of LLMs. Which get
bigger and bigger, and don't appear to have any consistent structure.
But also their failures. That they still try to learn that structure
as a fixed artifact.

Actually, as far as I know, the first model in the LLM style of
indeterminate grammar as a cross-product of embedding vectors, was
mine.

***If anyone can point to an earlier precedent I'd love to see it.***

So LLMs feel like a nice vindication of those early ideas to me.
Without embracing the full extent of them. They still don't grasp the
full point. I don't see reason to be discouraged in it.

And it seems by chance that the idea seems consistent with the
emergent structure theme of this thread. With the difference that with
language, we have access to the emergent system, bottom-up, instead of
top down, the way we do with physics, maths.

But everyone is working on their own thing. I just got drawn in by
Matt's comment that OpenCog didn't do language learning.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Teaac2c1a9c4f4ce3-Mc80863f9a44a6d34f3ba12a6
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Hey, looks like the goertzel is hiring...

2024-05-08 Thread Rob Freeman
Is a quantum basis fractal?

To the extent you're suggesting some kind of quantum computation might
be a good implementation for the structures I'm suggesting, though,
yes. At least, Bob Coecke thinks quantum computation will be a good
fit for his quantum style grammar formalisms, which kind of parallel
what I'm suggesting in some ways. That's what they are working on with
their Quantinuum, Honeywell and Cambridge Quantum spin-off (recently
another 300 million from JP Morgan.) Here's a recent paper from their
language formalism team (Stephen Clark a recent hire from DeepMind, I
think, though I think Coecke did the original quantum and category
theoretic combinatorial semantics papers with him when they were
together at Oxford back from 2008 or so.)

>From Conceptual Spaces to Quantum Concepts:
Formalising and Learning Structured Conceptual Models
Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljiˇc and Stephen Clark
https://browse.arxiv.org/pdf/2401.08585

Personally I think they've gone off on the wrong tangent with that. I
like the fact that Coecke has recognized a quantum indeterminacy to
natural language grammar. But I think it is pointless to try to
actually apply a quantum formalization to it. If it emerges, just let
it emerge. You don't need to formalize it at all. It's pointless to
bust a gut pushing the data into a formalism. And then bust a gut
picking the formalism apart again to "collapse" it into something
observable at run time.

But these maths guys love their formalisms. That's the approach they
are taking. And they think they need the power of quantum computation
to pull it apart again once they do it. So there's quantum computation
as a natural platform for that, yes.

For the rest of what you've written, I don't well understand what you
are saying. But if you're talking about the interpretability of the
kind of self structuring sequence networks I'm talking about,
paradoxically, allowing the symmetry groupings to emerge chaotically,
should result in more "visible" and "manageable" structure, not less.
It should give us nice, interpretable, cognitive hierarchies, objects,
concepts, etc, that you can use to do logic and reasoning, much like
the nodes of one of OpenCogs hypergraphs (it's just you need an on the
fly structuring system like I'm talking about to get adequate
representation for the nodes of an OpenCog hypergraph. They don't
exist as "primitives". Though Ben's probably right they could emerge
on top of whatever nodes he does have. But he's never had either the
computing power, or, actually the LLM like relational parameters, to
even start doing that.) So I see it as the answer for
interpretability, logic, "truthiness", and all the problems we have
now with LLMs (as well as the novelty, creativity, new "ideas" bit
associated with the complex system side.) You only get the quantum
like woo woo when you insist on squeezing the variability into a
single global formalism. Practically, the whole system should resolve
from moment to moment as clearly as the alternative perspectives of an
Escher sketch appear to us. One or the other. Clear in and of
themselves. Just that they would be able to flip to another state
discontinuously depending on context (and essentially both be there at
the same time until they are resolved.)

On Wed, May 8, 2024 at 1:00 PM Quan Tesla  wrote:
>
> If I understood your assertions correctly, then I'd think that a 
> quantum-based (fractal), evolutionary (chemistry-like) model would be 
> suitable for extending the cohesive cognition to x levels.
>
>  If the boundaried result emerges as synonymous with an LLM, or NN, then it 
> would be useful. However, if it emerges as an as-of-yet unnamed, 
> recombinatory lattice, it would be groundbreaking.
>
> My primary thought here relates to inherent constraints in visualizing 
> quantum systems. Once the debate between simple and complex systems end 
> (e.g., when an absolutist perspective is advanced), then the observational 
> system stops learning. Volume does not equate real progress.
>
> With an evolutionary model, "brainsnaps in time" may be possible. This 
> suggests  that scaling would be managable within relativistic and relevance 
> boundaries/targets.
>
> In the least, trackable evolutionary pathways and predictability of the 
> pattern of tendency of a system should become visible, and manageability 
> would be increased.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb63883dd9d6b59cc-M210f900801eb7251599971d1
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Hey, looks like the goertzel is hiring...

2024-05-07 Thread Rob Freeman
I'm disappointed you don't address my points James. You just double
down that there needs to be some framework for learning, and that
nested stacks might be one such constraint.

I replied that nested stacks might be emergent on dependency length.
So not a constraint based on actual nested stacks in the brain, but a
"soft" constraint based on the effect of dependency
 length on groups/stacks generated/learned from sequence networks.

BTW just noticed your "Combinatorial Hierarchy, Computational
Irreducibility and other things that just don't matter..." thread.
Perhaps that thread is a better location to discuss this. Were you
positing in that thread that all of maths and physics might be
emergent on combinatorial hierarchies? Were you saying yes, but it
doesn't matter to the practice of AGI, because for physics we can't
find the combinatorial basis, and in practice we can find top down
heuristics which work well enough?

Well, maybe for language a) we can't find top down heuristics which
work well enough and b) we don't need to, because for language a
combinatorial basis is actually sitting right there for us, manifest,
in (sequences of) text.

With language we don't just have the top-down perception of structure
like we do with physics (or maths.) Language is different to other
perceptual phenomena that way. Because language is the brain's attempt
to generate a perception in others. So with language we're also privy
to what the system looks like bottom up. We also have the, bottom up,
"word" tokens which are the combinatorial basis which generates a
perception.

Anyway, it seems like my point is similar to your point: language
structure, and cognition, might be emergent on combinatorial
hierarchies.

LLMs go part way to implementing that emergent structure. They succeed
to the extent they abandon an explicit search for top-down structure,
and just allow the emergent structure to balloon. Seemingly endlessly.
But they are a backwards implementation of emergent structure.
Succeeding by allowing the structure to grow. But failing because
back-prop assumes the structure will somehow not grow too. That there
will be an end to growth. Which will somehow be a compression of the
growth it hasn't captured yet... Actually, if it grows, you can't
capture it all. And in particular, back-prop can't capture all of the
emergent structure, because, like physics, that emergent structure
manifests some entanglement, and chaos.

In this thesis, LLMs are on the right track. We just need to replace
back-prop with some other way of finding emergent hierarchies of
predictive symmetries, and do it generatively, on the fly.

In practical terms, maybe, as I said earlier, the variational
estimation with heat of Extropic. Or maybe some kind of distributed
reservoir computer like LiquidAI are proposing. Otherwise just
straight out spiking NNs should be a good fit. If we focus on actively
seeking new variational symmetries using the spikes, and not
attempting to (mis)fit them to back-propagation.

On Tue, May 7, 2024 at 11:32 PM James Bowery  wrote:
>...
>
> At all levels of abstraction where natural science is applicable, people 
> adopt its unspoken presumption which is that mathematics is useful.  This is 
> what makes Solomonoff's proof relevant despite the intractability of proving 
> that one has found the ideal mathematical model.  The hard sciences are 
> merely the most obvious level of abstraction in which one may recognize this.
>...
>
> Any constraint on the program search (aka search for the ultimate algorithmic 
> encoding of all data in evidence at any given level of abstraction) is a 
> prior.  The thing that makes the high order push down automata (such as 
> nested stacks) interesting is that it may provide a constraint on program 
> search that evolution has found useful enough to hard wire into the structure 
> of the human brain -- specifically in the ratio of "capital investment" 
> between sub-modules of brain tissue.  This is a constraint, the usefulness of 
> which, may be suspected as generally applicable to the extent that human 
> cognition is generally applicable.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb63883dd9d6b59cc-M321384a83da19a33df5ba986
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Hey, looks like the goertzel is hiring...

2024-05-06 Thread Rob Freeman
Addendum: another candidate for this variational model for finding
distributions to replace back-prop (and consequently with the
potential to capture predictive structure which is chaotic attractors.
Though they don't appreciate the need yet.) There's Extropic, which is
proposing using heat noise. And, another, LiquidAI. If it's true
LiquidAI have nodes which are little reservoir computers, potentially
that might work on a similar variational estimation/generation of
distributions basis. Joscha Bach is involved with that. Though I don't
know in what capacity.

James: "Physics Informed Machine Learning". "Building models from data
using optimization and regression techniques".

Fine. If you have a physics to constrain it to. We don't have that
"physics" for language.

Richard Granger you say? The brain is constrained to be a "nested stack"?

https://www.researchgate.net/publication/343648662_Toward_the_quantification_of_cognition

Language is a nested stack? Possibly. Certainly you get a (softish)
ceiling of recursion starting level 3. The famous, level 2: "The rat
the cat chased escaped" (OK) vs. level 3: "The rat the cat the dog bit
chased escaped." (Borderline not OK.)

How does that contradict my assertion that such nested structures must
be formed on the fly, because they are chaotic attractors of
predictive symmetry on a sequence network?

On the other hand, can fixed, pre-structured, nested stacks explain
contradictory (semantic) categories, like "strong tea" (OK) vs
"powerful tea" (not OK)?

Unless stacks form on the fly, and can contradict, how can we explain
that "strong" can be a synonym (fit in the stack?) for "powerful" in
some contexts, but not others?

On the other hand, a constraint like an observation of limitations on
nesting, might be a side effect of the other famous soft restriction,
the one on dependency length. A restriction on dependency length is an
easier explanation for nesting limits, and fits with the model that
language is just a sequence network, which gets structured (into
substitution groups/stacks?) on the fly.

On Mon, May 6, 2024 at 11:06 PM James Bowery  wrote:
>
> Let's give the symbolists their due:
>
> https://youtu.be/JoFW2uSd3Uo?list=PLMrJAkhIeNNQ0BaKuBKY43k4xMo6NSbBa
>
> The problem isn't that symbolists have nothing to offer, it's just that 
> they're offering it at the wrong level of abstraction.
>
> Even in the extreme case of LLM's having "proven" that language modeling 
> needs no priors beyond the Transformer model and some hyperparameter 
> tweaking, there are language-specific priors acquired over the decades if not 
> centuries that are intractable to learn.
>
> The most important, if not conspicuous, one is Richard Granger's discovery 
> that Chomsky's hierarchy elides the one grammar category that human cognition 
> seems to use.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb63883dd9d6b59cc-Me078486d3e7a407326e33a8a
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Hey, looks like the goertzel is hiring...

2024-05-05 Thread Rob Freeman
On Sat, May 4, 2024 at 4:53 AM Matt Mahoney  wrote:
>
> ... OpenCog was a hodgepodge of a hand coded structured natural language 
> parser, a toy neural vision system, and a hybrid fuzzy logic knowledge 
> representation data structure that was supposed to integrate it all together 
> but never did after years of effort. There was never any knowledge base or 
> language learning algorithm.

Good summary of the OpenCog system Matt.

But there was a language learning algorithm. Actually there was more
of a language learning algorithm in OpenCog than there is now in LLMs.
That's been the problem with OpenCog. By contrast LLMs don't try to
learn grammar. They just try to learn to predict words.

Rather than the mistake being that they had no language learning
algorithm, the mistake was OpenCog _did_ try to implement a language
learning algorithm.

By contrast the success, with LLMs, came to those who just tried to
predict words. Using a kind of vector cross product across word
embedding vectors, as it turns out.

Trying to learn grammar was linguistic naivety. You could have seen it
back then. Hardly anyone in the AI field has any experience with
language, actually, that's the problem. Even now with LLMs. They're
all linguistic naifs. A tragedy for wasted effort for OpenCog. Formal
grammars for natural language are unlearnable. I was telling Linas
that since 2011. I posted about it here numerous times. They spent a
decade, and millions(?) trying to learn a formal grammar.

Meanwhile vector language models which don't coalesce into formal
grammars, swooped in and scooped the pool.

That was NLP. But more broadly in OpenCog too, the problem seems to be
that Ben is still convinced AI needs some kind of symbolic
representation to build chaos on top of. A similar kind of error.

I tried to convince Ben otherwise the last time he addressed the
subject of semantic primitives in this AGI Discussion Forum session
two years ago, here:

March 18, 2022, 7AM-8:30AM Pacific time: Ben Goertzel leading
discussion on semantic primitives
https://singularitynet.zoom.us/rec/share/qwLpQuc_4UjESPQyHbNTg5TBo9_U7TSyZJ8vjzudHyNuF9O59pJzZhOYoH5ekhQV.2QxARBxV5DZxtqHQ?startTime=164761312

Starting timestamp 1:24:48, Ben says, disarmingly:

"For f'ing decades, which is ridiculous, it's been like, OK, I want to
explore these chaotic dynamics and emergent strange attractors, but I
want to explore them in a very fleshed out system, with a rich
representational capability, interacting with a complex world, and
then we still haven't gotten to that system ... Of course, an
alternative approach could be taken as you've been attempting, of ...
starting with the chaotic dynamics but in a simpler setting. ... But I
think we have agreed over the decades that to get to human level AGI
you need structure emerging from chaos. You need a system with complex
chaotic dynamics, you need structured strange attractors there, you
need the system's own pattern recognition to be recognizing the
patterns in these structured strange attractors, and then you have
that virtuous cycle."

So he embraces the idea cognitive structure is going to be chaotic
attractors, as he did when he wrote his "Chaotic Logic" book back in
1994. But he's still convinced the chaos needs to emerge on top of
some kind of symbolic representation.

I think there's a sunken cost fallacy at work. So much is invested in
the paradigm of chaos appearing on top of a "rich" symbolic
representation. He can't try anything else.

As I understand it, Hyperon is a re-jig of the software for this
symbol based "atom" network representation, to make it easier to
spread the processing load over networks.

As a network representation, the potential is there to merge insights
of no formal symbolic representation which has worked for LLMs, with
chaos on top which was Ben's earlier insight.

I presented on that potential at a later AGI Discussion Forum session.
But mysteriously the current devs failed to upload the recording for
that session.

> Maybe Hyperon will go better. But I suspect that LLMs on GPU clusters will 
> make it irrelevant.

Here I disagree with you. LLMs are at their own dead-end. What they
got right was to abandon formal symbolic representation. They likely
generate their own version of chaos, but they are unaware of it. They
are still trapped in their own version of the "learning" idea. Any
chaos generated is frozen and tangled in their enormous
back-propagated networks. That's why they exhibit no structure,
hallucinate, and their processing of novelty is limited to rough
mapping to previous knowledge. The solution will require a different
way of identifying chaotic attractors in networks of sequences.

A Hyperon style network might be a better basis to make that advance.
It would have to abandon the search for a symbolic representation.
LLMs can show the way there. Make prediction not representation the
focus. Just start with any old (sequential) tokens. But in contrast to
LLMs, 

Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

2023-07-06 Thread Rob Freeman
On Thu, Jul 6, 2023 at 7:58 PM Matt Mahoney  wrote:
> ...
> The LTCB and Hutter prize entries model grammar and semantics to some extent 
> but never developed to the point of constructing world models enabling them 
> to reason about physics or psychology or solve novel math and coding 
> problems. We now know this is possible in larger models without grounding in 
> nonverbal sensory data, even though we don't understand how it happened.

We don't understand how it happened. No. We don't really understand
much at all. It's all been a process of hacking at some very old ideas
about training to fixed categories. Speeded up by GPUs developed for
the game market. And most recently enhanced by the accidental
discovery that context, through "attention", seems to be central.

That distributional analysis of language might result in categories
useful for broader reasoning was always plausible to me, so I find no
shock in it with LLMs. I just think LLMs are limited by not being able
to find novelty. Their categories are fixed at the time of training.
In reality I think the categories can shift. And be novel. That they
are chaotic attractors.

Anyway, I encourage those who are inclined to think about theory to
focus on the fact that simply allowing the size of the models to
increase seems to be of central importance.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-M156f326bfc2335867341f308
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

2023-07-06 Thread Rob Freeman
On Thu, Jul 6, 2023 at 7:54 PM James Bowery  wrote:
> On Thu, Jul 6, 2023 at 1:09 AM Rob Freeman  wrote:
>>
>> I just always believed the goal of compression was wrong.
>
> You're really confused.

I'm confused? Maybe. But I have examples. You don't address my
examples. You just enumerate a list of definitions.

Argument by definition is a common technique. Almost universal in
online forums. I tend to think all truth has a subjective aspect. So
definitions are intrinsically qualifiable. It's possible to argue
indefinitely about them, anyway.

And, a higher bar, your definitions extend to defining science. That
reveals an even deeper insecurity.

I recall Kuhn observed that people only argue about what principles
should underlie science at times of paradigm tension, when things are
not working properly. If things are useful, no-one worries about
principles. When things are working "science" remains just a body of
practice. Which like all bodies of practice retains contradictions and
inconsistencies. So your need to define principles of science is
indicative to me that your body of practice may be showing holes.

The hole in this case is probably that you have no argument to my
examples, other than definitions.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-M606824999ba36ea90306ddf9
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

2023-07-06 Thread Rob Freeman
On Thu, Jul 6, 2023 at 11:30 AM  wrote:
>...
> Hold on. The Lossless Compression evaluation tests not just compression, but 
> expansion!

It's easy to get lost in word definitions.

It sounds like you're using "expansion" in a sense of recovering an
original from a compression.

I'm using "expansion" in the sense of chaos. So more like generating a
hurricane from butterfly wings.

Seen in that light, the current obsession with "learning", or
compression, might be the equivalent of starting from a hurricane and
trying to squash it down into a butterfly's wings. "Compress" the
hurricane.

Which would be great if a hurricane were not a non-linear system and
basically organic to the entire planet. So, organic to not just one
butterfly, but all of them.

In practice it means current LLMs might be something like trying to
enumerate every hurricane ever seen. And maybe generalize across them.
So, say, have "deep" hierarchy stacking all the sub-eddies within
every hurricane, or an "attention" mechanism back along the storm
track, to compare them. That might be how it places current tech in
the context of explaining hurricanes.

People do that with movement too. Transformer motion models? They
capture zillions of frames, and reproduce anime simulations of motion.

Of course, every new hurricane will actually be different. To really
capture the relevant parameters of hurricane structure you would need
to keep every butterfly on the planet, and every air molecule, and...

I'm saying this kind of top down "learning", or compression, is a dumb
way to explain hurricanes. Models which are just LARGE, is the wrong
simplicity. It's the wrong way of dealing with what is actually chaos
in the generation of hurricanes. Just allowing your data-bases to get
bigger and bigger might reproduce hurricane like images. It might even
predict the behaviour of a good number of them within statistical
limits (LLM weather forecasting anyone?) With the very size which
enumeration permits, it might be a way to reflect their puzzling
variety. But it won't actually have anything to do with hurricanes.
That you won't actually understand hurricanes until you accept that
they "expand", from tiny causes. You can't capture the essence of
hurricanes just by enumerating them and comparing them. You'll get an
enormous data-base, and it will only ever get bigger, but it won't
actually have anything to do with hurricanes. Any more than anime
simulations of motion have anything to do with motion.

All you will have done is capture something of the enormous variety of
storms which can be generated, the sheer SIZE of the historical
record, and ever expanding. Without capturing what is actually
generating them.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-Mfd1bd907b0ca63f5a2fde0d7
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

2023-07-06 Thread Rob Freeman
On Thu, Jul 6, 2023 at 3:51 AM Matt Mahoney  wrote:
>
> I am still on the Hutter prize committee and just recently helped evaluate a 
> submission. It uses 1 GB of text because that is how much a human can process 
> over a lifetime. We have much larger LLMs, of course. Their knowledge is 
> equivalent to thousands or millions of humans, which makes them much more 
> useful.
>
> I believe that the Hutter prize, and the Large Text benchmark on which it is 
> based, helped establish text prediction using neural networks and massive 
> computing power as the path to AGI. The idea was still controversial when I 
> started the benchmark in 2006. Most people on this list and elsewhere were 
> still pursuing symbolic approaches.

Still pursuing symbolic approaches? Notably OpenCog. Yes, that's been
a disaster. The whole Link Grammar thing wasted 10 years.

Did the Hutter Prize move the field? Well, I was drawn to it as a rare
data based benchmark.

I just always believed the goal of compression was wrong.

I still think that's the case. As a benchmark it will be fine. But the
eventual solution will be something which expands new structure.
Actually chaotically (with implications for consciousness, and
"uploading", actually, but not worth going there for now.) So the
benchmark will remain. Just everything we imagine happening under the
hood, will change.

I recall Hutter's original goal was to find underlying semantic
variables. I wonder how he views that now.

As far as driving change... In the 00s everyone was focused on
statistics. The neural network boom was over. Statistics dominated.
Marcus Hutter formulated a statistical definition of intelligence. The
Hutter Prize was a statistical goal. That's what I remember.

If you think it drove a renewed "neural" revolution, well it did work
with data. But I would be surprised if many people back then
characterized their entries as "neural".

A "neural" model is in many ways the opposite of a statistical model.
To be "neural" is to retain data in unsummarized form.

What happened was not smaller models, but "deep" models.

Not that anyone thought much about it. The renewed "neural" revolution
(rechristened as "deep", the taboo on the term "neural network" never
really recovered from the stigma of being old tech in the '00s, now
it's "deep", or "transformer", or just plain LARGE...) is paradoxical,
because they retain the statistical idea they are summarizing, while
what actually works repeatedly seems to be the opposite of
summarizing. What is "deep"? It is more structure. Things constantly
manifest as getting better when those seeking to summarize, summarize
less. It's always the pattern. Things get better when we allow more
structure.

So was it the goal of summarizing better as per the Hutter Prize which
drove the field?

I would say what drove the field was (accidentally) embracing more
structure again. Getting bigger. LARGE. That's also how I would
characterize "attention". Enumerating more structure. So first more
structure with "deep", and then more structure with "attention".

But maybe those submitting entries to the Hutter Prize have seamlessly
transitioned away from thinking of themselves as seeking small models,
and now somehow keep in mind the idea of "deep" or "large", and
distributed, while somehow also imagining they are retaining the
original goal of small.

Possibly they don't think much about it at all. In many ways language
models are a theory vacuum. It's all tools. There's very little
imaginative conception of why things should be the way they are, or
why one tool works better than another.

Constructively, the question which interests me is whether anyone sees
any evidence of a ceiling in the number of trained "parameters"
(distinct and meaningful patterns) which can be squeezed out of a set
of data in a LLM. Chinchilla fascinated me, because it seemed the
improvement with more training was in exact proportion to the
improvement with simply adding more data, more structure. It seemed
like more data and more training were in many ways acting like the
same thing. I'm looking for evidence there is any ceiling to that. So,
not just the entropy benchmark of the Hutter Prize as such, but
evidence what is happening under the hood in the entrants which work
best. Evidence that what works under the hood more resembles an
endless expansion of more structure, not less.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-M52c08b01e361f07555f5f1ee
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

2023-07-05 Thread Rob Freeman
On Wed, Jul 5, 2023 at 7:05 PM Matt Mahoney  wrote:
>...
> LLMs do have something to say about consciousness. If a machine passes the 
> Turing test, then it is conscious as far as you can tell.

I see no reason to accept the Turning test as a definition of
consciousness. Who ever suggested that? Even Turing never suggested
that to my knowledge. And the Turing test is even a weak, non
explanatory, definition of intelligence. It doesn't say much to me
about intelligence. It says nothing to me about consciousness. I don't
even know if you're conscious.

What does amuse me about LLMs is now large they become. Especially
amusing in the context of the Hutter Prize. Which I recall you
administered for a time.

I recall the goal of the Hutter Prize was to compress text, on the
assumption that the compression would be an abstraction of meaning.

I argued it was the wrong goal. That meaning would turn out to be an
expansion of data.

And now, what do we find? We find that LLMs just seem to get bigger
all the time. That more training, far from compressing more
efficiently, just keeps on generating new parameters. In fact training
for longer generates a number of parameters roughty equivalent to just
adding more data.

I asked about this online. Is there any evidence for a ceiling. The
best evidence I was able to find was for a model called Chinchilla. It
seems that Chinchilla at 70B parameters slightly outperformed a 280B
model trained with 4.5x fewer (300B vs 1.4T) tokens.

So 4x the training gave a result much the same as 4x the data!

Is training compressing the data, or expanding it?

In practice it seems people are going with more data. Much easier than
doing more training. But it seems they are much the same thing. It
says nothing about what would happen if you just kept training for
ever. Eternally better performance with eternally more "parameters"?
Nobody knows.

Anyway, the whole success of the, enormously BIG, LARGE language
models, with no ceiling yet in sight, seems to knock into a cocked hat
once and for all the whole conception of the Hutter Prize, that
intelligence is a compression, and the model with the smallest number
of parameters would turn out to be the best.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-Md910373c6afaf37948f6942d
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: deepmind-co-founder-suggests-new-turing-test-ai-chatbots-report-2023-6

2023-07-05 Thread Rob Freeman
Off topic, and I haven't followed this thread, but...

On Tue, Jul 4, 2023 at 10:21 PM Matt Mahoney  wrote:
>...
>
> We are not close to reversing human aging. The global rate of increase in 
> life expectancy has dropped slightly after peaking at 0.2 years per year in 
> the 1990s. We have 0 drugs or interventions proven to slow aging because it 
> takes decades to do experiments. Calorie restriction might work but nobody is 
> doing it.

That seems unnecessarily pessimistic.

To nitpick, "0 drugs or interventions proven to slow aging" may be the
case for humans, but there are any number of interventions which have
been proven for animals with lifespans short enough to test. Going in
both directions.

In short lived animal models it's become almost routine to produce
accelerated aging phenotypes, and then reverse them.

So "drugs or interventions proven to slow aging" do exist, in numbers,
only not yet for humans.

Even for humans, we may not have had the time or inclination to do a
double blind placebo study from birth to death. But for biomarkers
identified, David Sinclair for one claims his biomarkers have been
reversed some 10 years.

mTOR and SIRTx activating interventions (from memory) have been
demonstrated on a wide enough range of animal models, that it would be
surprising if they don't add the same 10-30% for humans (yeast
(Saccharomyces cerevisiae)3,4,5, worms (Caenorhabditis elegans)6,
fruit flies (Drosophila melanogaste r)7,8, rodents9 and, most
recently, rhesus monkeys (Macaca mulatta), says this paper:
https://www.nature.com/articles/nrm2944)

The most interesting current interventions are on the chromatization
clock, using variations of "Yamanaka factors", which demonstrably can
reverse phenotypes right back to undifferentiated stem cells, and the
telomere clock, which produces aged phenotypes when artificially
shortened, and then reverses them when lengthened again.

If you've got a spare $100k or so, Liz Parrish can swap in a patch to
keep your telomeres long, as she did to herself some 7-8 years ago.

Likely Parrish's chromatization clock is still ticking, and nobody
seems to have wanted to try any kind of Yamanaka factor intervention
on themselves yet. But Sinclair has demonstrated it in rodent models:

"Sinclair (Table 1), have shown that partial reprogramming can
dramatically reverse age-related phenotypes in the eye, muscle and
other tissues in cultured mammalian cells and even rodent models by
countering epigenetic changes associated with aging."
https://www.nature.com/articles/d41587-022-2-4

Parrish is arguing that given aging is a case where "do no harm" no
longer has the best odds, the existing medical intervention "safety"
bar, is out of date. They are proposing a new "Best Choice Medicine"
model, which says if you're about to die, you should be allowed to try
some things (Trump actually signed a similar bill into law. "Right to
Try"? But not including aging, I think):

https://www.bestchoicemedicine.com/general-1

So, no, no 100 year, double blind placebo "proven" interventions in
humans yet. But "not close to reversing" seems like an overstatement
to me. Not close to completing a 100 year double blind placebo trial
on humans yet, no. But if I were 90 years old and facing Russian
Roulette odds (~1/6 chance of dying each year at age 90?), I think the
existing tech is close enough to "starting to reverse aging" that I
would be willing to take some risks.

> Maybe we can achieve immortality by uploading, if you believe that a robot 
> trained on your data to predict your actions and carry them out in real time 
> is you.

This one I think is further away. As you hint, it really holds LLM
tech to the fire, and reminds us that LLMs still have nothing to say
about any kind of theory for replicating human consciousness. (Well,
not quite nothing, I would argue that the size these things reach is
telling us something, but that's more a hint on the level of say the
Michelson-Morley experiment was for physics, which is to say a puzzle
which might lead eventually to insight, but which in terms of the
mainstream is still just a puzzle for now.)

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-Mb681cbc4f7239c48059327fa
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The next advance over transformer models

2022-07-01 Thread Rob Freeman
On Fri, Jul 1, 2022 at 3:34 PM Brett N Martensen 
wrote:

> If you are looking for a hierarchical structure which reuses simpler parts
> (letters, words, phrases) in compositions that include overlaps ... you
> might want to have a look at binons.
> http://www.adaptroninc.com/BasicPage/presentations-and-slides
>

How does your use of the word "overlaps" correspond to Howarth's use Brett?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M89b7759f63f93262f99dca30
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The next advance over transformer models

2022-07-01 Thread Rob Freeman
On Fri, Jul 1, 2022 at 12:47 AM Boris Kazachenko  wrote:

> ...
> Do you mean two similar input-inputs that are not in the same input?
>

I'd prefer to phrase it in terms of Howarth's data for natural language.

I mean what Howarth calls "blends".

Howarth contrasts "blends" with what he calls "overlaps".

An example of what Howarth calls an "overlap" is.

e.g: "Those learners usually _pay_ more _efforts_ in adopting a new
language..."

*pay effort
PAY attention/a call
MAKE a call/an effort

Trying to express that "overlap" as a network (if my ascii art survives
posting):

attention
  /
  pay
/ \
(?) a call
\ /
  make
  \
an effort

They "overlap". The networks of usage are connected over the "overlapping"
sequence/prediction of "a call". This is my definition of shared
connectivity. The speaker appears to have synthesized a(n ad-hoc?) category
on the basis of an observed shared sequence/prediction, and generalized
usage with it.

These "overlaps" are contrasted with what Howarth calls "blends":

"Blends" are groupings which don't share observed network "overlaps", but
which share similar "meaning".

e.g: '*appropriate _policy_ to be _taken_ with regard to inspections'

TAKE steps
ADOPT a policy

"take" and "adopt" are quite similar in meaning. You might say they may
have similar networks of internal connectivity. So they may be similar
according to your definition of shared connectivity. And, yes, here they
are observed to have been grouped, and interchanged (in a situation where a
native speaker would not have interchanged them. It's a disfluency,
revealing the underlying mechanisms. That's what makes all these examples
interesting.)

But Howarth finds that these "blends" happen less often than "overlaps".

It would seem to indicate that groupings based on shared internal
connectivity (blends) do occur. You're right. But they occur less
dominantly than groupings based on shared observed sequences or predictions
(overlaps.)

Or that this is at least the case for natural language.

For natural language, according to Howarth's evidence, it seems to be
shared connectivity of the type I am talking about, observed shared
sequence, which dominates. Or at the very least, also occurs.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-Mfee174e4c9499e7aec243593
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The next advance over transformer models

2022-06-30 Thread Rob Freeman
On Thu, Jun 30, 2022 at 2:18 PM Boris Kazachenko  wrote:

> On Thursday, June 30, 2022, at 6:10 AM, Rob Freeman wrote:
>
> what method do you use to do the "connectivity clustering" over it?
>
>
> I design from the scratch, that's the only way to conceptual integrity in
> the algorithm: http://www.cognitivealgorithm.info.
>

I see. So "shared connectivity" not so much in the sense of being connected
together. But in the sense of having the same internal connectivity within
two groups which are not directly connected together.

OK, good. It's good to be clear you are thinking of "connectivity" in
another sense.

That's a valid sense. I'm focused on the shared observed prediction sense.
So your sense didn't occur to me. But that could be a sense of shared
connectivity too.

So, how about this. Could we use both? If two separate clusters share a
prediction, I don't see why you could not then connect them through such
shared predictions which occur in a data set, without doing a direct
comparison of their respective internal connectivity?

You might think of it as somewhat the reverse of what you are doing. I
understand you to be comparing connectivity, and predicting based on that.
I'm suggesting that if two clusters will tend to share predictions, that
might be revealed more directly by such predictions as are observed already.

Now, where my idea might come unstuck, is where there are two clusters
which might be used to predict the same things on the basis of their
internal similarity, as you suggest, but actually they have never been
observed to share any predictions. In which case, yes, you would need to
directly check any similarity in their clustering, and yours would be the
only mechanism.

In language that might equate to two words which "mean" the same thing, but
which have never been observed to be used in the same sequence.

Actually, dredging back... I think we have some evidence for what happens
in that case.

Taking from what I wrote somewhere else:
<<<
I think the evidence from language learning is somewhat the opposite. It
starts with the particular, and only generalizes later. I always found
examples in this study by Peter Howarth some years ago a striking example
of this (all appear to be paywalled these days, unfortunately):

Phraseology and Second Language Proficiency. Howarth, Peter. Applied
Linguistics , v19 n1 p24-44 Mar 1998
...
What interested me was his analysis of two types of collocational
disfluencies he characterized as "blends" and "overlaps".

By "overlaps" he meant an awkward construction which was nevertheless
directly motivated by the existence of an overlapping collocation:

e.g.

"Those learners usually _pay_ more _efforts_ in adopting a new language..."

*pay effort
PAY attention/a call
MAKE a call/an effort

So "*pay efforts" might be motivated by analogy with "pay attention" and
"make an effort" (because of the overlapping collocations "pay a call" and
"make a call".)

In Howarth's words (at least in my pre-print):

"Blends, on the other hand, seem to occur among more restricted
collocations, where the verbs involved are more obviously figurative or
delexical in meaning and the nouns are semantically related, though there
are no existing overlapping collocations.

'*appropriate _policy_ to be _taken_ with regard to inspections'

TAKE steps
ADOPT a policy
...

It is remarkable, firstly, that NNS writers produce many fewer blends than
overlaps and, secondly, that it is the more proficient (by informal
assessment) who produce them."

What I understand Howarth to be saying is that "overlaps" tend to be
produced first, and conceptual "blends" only later. It is the opposite of
what we would expect if language learning started by combining words
according to general concepts.
<<<

So on that evidence, yes, the internal similarity mechanism you suggest
might be a valid one. It might be used. But the similarity based on
observed shared context/prediction mechanism I'm suggesting at least
appears to exist, based on what we observe from Howarth's "overlaps", and
might be a stronger mechanism. For natural language, anyway.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M8c23a2e9598669e33bc3f173
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The next advance over transformer models

2022-06-30 Thread Rob Freeman
On Thu, Jun 30, 2022 at 1:51 PM Rob Freeman 
wrote:

> On Thu, Jun 30, 2022 at 1:33 PM Boris Kazachenko 
> wrote:
>
>> ...
>> My alternative is to directly search for shared properties: lateral
>> cross-comparison and connectivity clustering.
>>
By the way, independently of what shared properties your connectivity
signifies, what method do you use to do the "connectivity clustering" over
it?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M2477347af5a8d6e5b9d4b471
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The next advance over transformer models

2022-06-30 Thread Rob Freeman
On Thu, Jun 30, 2022 at 1:33 PM Boris Kazachenko  wrote:

> On Thursday, June 30, 2022, at 3:00 AM, Rob Freeman wrote:
>
> I'm interested to hear what other mechanisms people might come up with to
> replace back-prop, and do this on the fly..
>
>
> For shared predictions, I don't see much of an alternative to backprop, it
> would have to be feedback-driven anyway. Which means coarse and wasteful.
>
> My alternative is to directly search for shared properties: lateral
> cross-comparison and connectivity clustering. It's a lot more complex, but
> that complexity is intelligent, it's not just throwing more brute force on
> the same dumb algorithm.
>

Well, I agree with you on the second part: "directly search for shared
properties: lateral cross-comparison and connectivity clustering".

But I don't see why that could not be used to group for shared predictions.

If you made prediction the "shared property", doing a "connectivity
clustering" of shared predictions, which is to say making "prediction" or
sequence the connectivity, then that would group for shared
predictions, wouldn't it?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M84a50c3756ea7ab8b62ca816
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The next advance over transformer models

2022-06-30 Thread Rob Freeman
On Thu, Jun 30, 2022 at 10:40 AM Ben Goertzel  wrote:

> "what method could directly group hierarchies of elements in language
> which share predictions?"
>
> First gut reaction is, some form of evolutionary learning where the
> genomes are element-groups
>
> Thinking in terms of NN-ish. models, this might mean some Neural
> Darwinism type approach for evolving the groupings
>

Neural Darwinism? I noticed in your "Chaotic Logic" book from 1994 that you
were a fan of Edelman, Ben.

I don't know if Edelman's mechanism could flip structures from sentence to
sentence, though. Could it?

Actually, you already know what mechanism I think evolution stumbled on for
grouping elements which share predictions, on the fly. I wanted to keep
some distance between my speculated mechanism and the general idea that
transformers might be generating a different grammar for each sentence. I'm
interested to hear what other mechanisms people might come up with to
replace back-prop, and do this on the fly..

But as you know, I do already have a candidate mechanism in mind.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M5631520a5068e4217c45c2a3
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The next advance over transformer models

2022-06-29 Thread Rob Freeman
On Wed, Jun 29, 2022 at 11:14 PM James Bowery  wrote:

> To the extent that grammar entails meaning, it can be considered a way of
> defining equivalence classes of sentence meanings.  In this sense, the
> choice of which sentence is to convey the intended meaning from its
> equivalence class is a "special rule" for that particular sentence.  Is
> that what you're getting at?
>

Possibly. There are lots of "special rules" in language for sure: "ham and
eggs", not "eggs and ham", that kind of thing, or "strong tea" not
"powerful tea" etc. It's a randomness. And looked at on the flipside, I
suggest possibly the kind of "sensitivity to initial conditions" which is
such a powerful well of structure for chaotic systems.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M83d1d7bb79bea941bf536868
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: The next advance over transformer models

2022-06-29 Thread Rob Freeman
On Wed, Jun 29, 2022 at 11:11 PM Boris Kazachenko 
wrote:

> On Wednesday, June 29, 2022, at 10:29 AM, Rob Freeman wrote:
>
> You would start with the relational principle those dot products learn, by
> which I mean grouping things according to shared predictions, make it
> instead a foundational principle, and then just generate groupings with
> them.
>
>
> Isn't that what backprop does anyway?
>

They may use a pre-learned relational principle in a sense. Pre-training,
you say?

But they then revert to back-prop again?

That's fine. I guess back-prop would go on to learn hierarchy, just as it
learned the grouped predictions at any "pre-training" level.

But that is not quite the application as a foundational principle I
was talking about. If you actually, actively, substitute things which share
predictions, that's a little more foundational than just using initial
groupings as a basis for more back-prop.

You could do either. I'm suggesting that if you end up getting a different
grammar for each sentence, the second way, just actively substituting
things which share predictions, and not doing more back-prop on initial
groupings, is a more efficient way to do it. Because going the active
substitution way just generates groupings for each sentence as you go.
Back-prop is trying to optimize over the whole data-set. If there are not
actually that many global optimizations, if there are actually an infinite
number of chaotically expanding, global optimization attractors, it becomes
horribly inefficient.

It's a hypothesis anyway. I don't know if anyone has looked at any
hierarchy generated by a transformer, as in the paper I linked, and checked
to see if it is a different hierarchy for each sentence.

Personally, the fact these things are just churning out more and more,
billions, of parameters, is a hint to me.

But I don't know if anyone has checked.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-Md02add195b2316d212728693
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: The next advance over transformer models

2022-06-29 Thread Rob Freeman
On Wed, Jun 29, 2022 at 2:19 PM John Rose  wrote:

> ...Bob Coecke’s spidering and togetherness goes along with how I think
> about these things. The spidering though is a simplicity, a visual
> dimension reduction itself for symbolic communication coincidentally like a
> re-grammaring of representation. But I like it a lot, it's great, the
> ZX-calculus, etc..
>

I like Coecke's work. It's putting a finger on this structural
indeterminacy which I think has been preventing us from finding adequate
representations for cognition, since forever.

But Coecke is starting top down. That makes the formal representation
implications clear, but complicates the computational problem.

I think it is more powerful to start from the learning procedure
perspective. Then these things are generated organically, and you don't
have to go to special maths to represent them.

Given the learning procedures, there is no reason to sweat bricks to group
general representation formally, and then sweat bricks using quantum
computing to collapse it down to specific cases again. You can just
generate the one you want, when you want it.

I hypothesize that transformers are generating structures which might well
be grouped formally in the ways Coecke does. It's just hidden because we
don't pay attention to the internal structures at all. So transformers
might actually be a representation for Coecke's formalisms.

The problem with transformers in that case, is only that they are trying to
collapse all possible observations beforehand.

You might use the same learning procedure that the transformers use, but do
it only at need when presented with a particular problem.

Well, you wouldn't use exactly the learning procedures transformers use.
You wouldn't use back-prop on a dot product to learn the relational
principle implicitly from the task. You would start with the relational
principle those dot products learn, by which I mean grouping things
according to shared predictions, make it instead a foundational
principle, and then just generate groupings with them.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-Mcba94316fd2639f8b7d5d45a
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: The next advance over transformer models

2022-06-29 Thread Rob Freeman
On Wed, Jun 29, 2022 at 2:19 PM John Rose  wrote:

> ...
> Sorry, I meant that it sounds like an “intuition” mechanism that would be
> grouping hierarchies of elements in language which share predictions,
>

You might call our sense of what structures are "correct" in language an
intuition, I guess.

It's sort of central to the "subjective primitives" idea that word meanings
will be variable (meaning/grammar, two sides of the same coin.) So it's not
going to make a lot of sense to stress over exact word meanings.

If it suits you to call it intuition, it probably is (and isn't!) in
different ways.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M290b1626804f207a49acb6c5
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: The next advance over transformer models

2022-06-27 Thread Rob Freeman
On Tue, Jun 28, 2022 at 6:25 AM John Rose  wrote:

> ...
> On Saturday, June 25, 2022, at 6:58 AM, Rob Freeman wrote:
>
> If all the above is true, the key question should be: what method could
> directly group hierarchies of elements in language which share predictions?
>
>
> Is this just intuition?
>

If you're asking whether it is just intuition that the grammar learned
might be different for each sentence, I would say it's more of a
hypothesis. I think there is evidence formal grammars for natural language
cannot be complete. So for me that motivates the hypothesis beyond the
point I would call it intuition.

As support for the hypothesis I cite the history of linguistics going back
to:

1) Chomsky's rejection of learning procedures and assertion any formal
system for natural language grammar must be innate, not least because what
is observed, contradicts:

Part of the discussion of phonology in ’LBLT’ is directed towards showing
that the conditions that were supposed to define a phonemic representation
... were inconsistent or incoherent in some cases and led to (or at least
allowed) absurd analyses in others." Frederick J. Newmeyer, Generative
Linguistics a historical perspective, Routledge 1996.

2) Sydney Lamb's counter that an alternative explanation was that the
problem was "the criterion of linearity".

3) More recent work, including that done by OpenCog's own Linas Vepstas,
that:

Vepstas, “Mereology”, 2020: "In the remaining chapters, the sheaf
construction will be used as a tool to create A(G)I representations of
reality. Whether the constructed network is an accurate representation of
reality is undecidable, and this is true even in a narrow, formal, sense."

4) Bob Coecke's work on "togetherness", and a "quantum" quality to grammar:

Coecke: "we argue for a paradigmatic shift from `reductionism' to
`togetherness'. In particular, we show how interaction between systems in
quantum theory naturally carries over to modelling how word meanings
interact in natural language."

And more simply, my own experience, that when learning grammar
(by distributional analysis), there is typically more than one,
contradictory, way to do it (which is just a power of sets, and to be
excluded, more than justified.)

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M385b2f935978c6bbd0e8d353
Delivery options: https://agi.topicbox.com/groups/agi/subscription


[agi] The next advance over transformer models

2022-06-25 Thread Rob Freeman
I've been taking a closer look at transformers. The big advance over LSTM
was that they relate prediction to long distance dependencies directly,
rather than passing long distance dependencies down a long recurrence
chain. That's the whole "attention" shtick. I knew that. Nice.

But something I was less aware of was that having broken long distance
dependencies from the recurrence mechanism seems to have liberated them to
go wild with directly representing dependencies. And with multi layers it
seems they are building hierarchies over what they are "attending" to. So
they are basically building grammars.

This paper makes that clear:

Piotr Nawrot, Hierarchical Transformers are More Efficient Language Models.
https://youtu.be/soqWNyrdjkw

They show that middle layers of language transformers explicitly generalize
to reduce dimensions. That's a grammar.

The question is, whether these grammars are different for each sentence in
their data. If they are different they might reduce the dimensions of
representation each time, but not in any way which can be abstracted
universally.

If the grammars generated are different for each sentence, then the
advantage of transformers over attempts to learn grammar, like OpenCog's,
will be that ignoring the hierarchies created and focusing solely on the
prediction task, frees them from the expectation of universal primitives.
They can generate a different hierarchy for each data sentence, and no-body
notices. Ignorance is bliss.

Set against that advantage, the disadvantage will be that ignoring the
actual hierarchies created means we can't access those hierarchies for
higher reasoning and constraint using world knowledge. Which is indeed the
problem we face with transformers.

And another disadvantage will be the equally known one that generating
billions of subjective hierarchies in advance is enormously costly. And the
less known one dependent on the subjective hierarchy insight, that
generating hierarchies in advance is enormously wasteful of effort, and
limiting. Because there will always be a limit to the number of subjective
hierarchies you can generate in advance.

If all this is true, the next stage to the advance of transformers will be
to find a way to generate only relevant subjective hierarchies at run time.

Transformers learn their hierarchies using back-prop to minimize predictive
error over dot products. These dot products will converge on groupings of
elements which share predictions. If there were a way to directly find
these groupings of elements which share predictions, we might not have to
rely on back-prop over dot products. And we might be able to find only
relevant hierarchies at run time.

So the key to improving over transformers would seem to be to leverage
their (implicit) discovery that hierarchy is subjective to each sentence,
and minimize the burden of generating that infinity of subjective
hierarchies in advance, by finding a method to directly group elements
which share predictions, without using back-prop over dot products. And
applying that method to generate hierarchies which are subjective to each
sentence presented to a system, only at the time each sentence is presented.

If all the above is true, the key question should be: what method could
directly group hierarchies of elements in language which share predictions?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-Mcc9c079782e1c06676c055ea
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] AGi Discussion Forum sessions -- semantic primitives (Mar 18) and formalization of MeTTa (April 8)

2022-03-16 Thread Rob Freeman
Jean-Paul,

On Tue, Mar 15, 2022 at 1:42 PM Jean-Paul VanBelle via AGI <
agi@agi.topicbox.com> wrote:

> Strange that you didn't reference Schank and conceptual dependency theory
> (1975) which appeared to be quite successful at representing huge amounts
> of human knowledge with a very small number of semantic primitives - and
> his was an AI effort, not a linguistic approach.
>

If I attempted a comprehensive summary of the search for semantic
primitives I'd need not a book but a library!

The search within linguistics alone is voluminous.

And I actually think the search within linguistics is arguably more
interesting than a specifically AI directed effort. That's because I would
argue language has a unique relation to thought.  In language alone are the
processes of thought arguably made physical.

Mathematics perhaps makes products of thought physical. But not necessarily
the processes.

So language offers us a unique opportunity and artifact.

But the search for primitives within language alone is voluminous.

And even the break into a specific search for semantic primitives (as
opposed to structural or functional primitives...) has resulted in at least
4 or 5 different branches. (That break being called the Linguistic Wars
BTW, and has indeed merited at least one book.)

To digress, 'cos just looking at the forms of these things is fascinating.
Like a kaleidoscope. Just to sketch some of the menagerie people have
resolved for semantic shapes within linguistics. For curiosity's sake, I
quite like Leonard Talmy's branch. I've always been fascinated by his
categorization of languages into satellite-framed vs. verb-framed languages
for a distinct mental difference between languages in terms of perceived
"primitives'', exemplified by English which refers to actions mostly in
terms of the manner of acting, specified as an actual action only by the
addition of a "satellite". So, in English you can "roll down" a hill, where
in French you must "descend" and specify "rolling" separately. And his
analysis of an emphasis in Native American languages on an equation between
action and actor, so an action always actually IS its actor, with rare
examples in English like "rain" (What does rain do? Rain.)

On a more substantive note with particular relevance to Talmy, and showing
how without Chomsky the field might have branched into exploring the
dynamical systems implications of problems with learning procedures, you
can read Wolfgang Wildgen's "Dynamic Turn" which specifically relates
aspects of a specific relating of Talmy's Force Dynamic search for
primitives, to a need for a re-appraisal of meaning in terms of dynamical
systems:

Wildgen: The "dynamic turn" in cognitive linguistics
https://varieng.helsinki.fi/series/volumes/03/wildgen/index.html

There are all sorts of fascinating "primitive" patterns you can resolve
depending on the lens you use.

Lakoff is another branch of that search for semantic primitives. While he's
built beautiful analyses of hierarchies of meaning in metaphor, to my last
knowledge he was seeking semantic primitives at the level of the neuron. So
his only "primitives" end up being actual embodiments. Something like the
idea that the "primitive" is the world.

There's others: cognitive, frame, embodied...

And that's just the branch of linguistics which sought primitives for
meaning specifically. The search for primitives in linguistics has ranged
far and wide. You have an entire branch of linguistics which sought
primitives in function, not action: the why, not the what.

Chomsky himself sallied on looking for primitives in structure, not
meaning, and not function. A search which had many "primitive" iterations:
transformational, principles & parameters, minimalist.

You might say Chomsky was really THE great PRIMITIVES guy in linguistics.

And this is again the interesting conversation. Because without Chomsky it
is interesting to imagine how things might have gone. Better maybe. Because
what is interesting about Chomsky is that he pointed out that IF primitives
exist, they MUST be innate, BECAUSE those that are observed CONTRADICT.

That's a big thing. He doesn't get enough credit for that. We've forgotten
it. After a while machine learning came back. But forgot it.

In practice it's forgotten. Though Chomsky does pop up from time to time
insisting that MACHINE LEARNING CANNOT WORK. By which he still means
machine learning of primitives, of course, because that is all anyone ever
expects to find.

But it's poignant. I don't know whether to laud him or lament him for it.
Because before him the path of linguistics was, essentially, machine
learning. This is the 1950s. Then Chomsky comes along with this observation
that machine learning of language structure results in contradictions.

Faced with the observation that machine learning led to contradictions, the
field might have gone either way. Had we been more familiar with complex
systems at that time, people might have simply 

Re: [agi] AGi Discussion Forum sessions -- semantic primitives (Mar 18) and formalization of MeTTa (April 8)

2022-03-15 Thread Rob Freeman
On Mon, Mar 14, 2022 at 11:48 PM Ben Goertzel  wrote:

> The dynamically, contextually-generated pattern-families you describe
> are still patterns according to the math definitions of pattern I've
> given ...
>

Good.

Then your definition can embrace my hypothesis that cognition is an
expansion of the world, not a compression?

It seems a pity nobody has tried it.


> And yeah Coecke's category-theoretic explorations closely relate to my
> comments on paraconsistent logic etc.


Good again. It is nice that I don't see you actually contesting any of my
points.

I'm beginning to see how you can continue to seek "primitives" despite the
evidence that meaning, even what appear to be meaning "primitives", are
subjective, contradict, and are actually constructed.

If you insist on resolutely finding Goedel incompleteness "not shocking",
by the "not shocking" expedient of redefining logic to be "paraconsistent
logic", and negation to be "intuitionistic negation"i, then perhaps
superficially much formalism can remain the same. Equally, by
paraconsistency we might expect paraconsistent primitives to be not quite
the same as each other, and reach the equally "not shocking" conclusion
that "primitives" too will contain a plurality of structure.

Perhaps even an infinity of structure. Which is a sub-class of my point
that there might be an infinity of meaning.

Anyway, your formulation too seems to concede that "primitives" will have
internal structure.

"Primitives" which have internal structure sounds a bit to me like the flat
earther who continues to be a flat earther by redefining flat to be curved.
But OK.

You have a point that such mountain peaks of semantic complexity which this
definition of "primitives" implies, do commonly have a broad consistency.
But once you've conceded they have internal structure, why not construct
them instead? What is the true "primitive", the true simplicity of the
system? Is it the structure, or the principle of construction?

But yeah, possibly you can keep your top-down formalism. The infinity might
be only within a finite number of "primitive" super classes. Though you
might have to perform further violence to your definitions by allowing
contradiction to become equated with consistency Or have you done that
already? Anyway, some kind of top down formalism may still be appropriate.

The problem remains how to find those highly structured "primitives".

If you agree your move to paraconsistent logic is closely related to
Coecke's category-theoretic explorations, what do you think of Coecke's
comments that he is recently moving away from such elaborated top down
structure, and seeking to derive everything from observations?

Coecke: "We are adapting everything now to learn structure from
observations, and we abstract away quantum theory."
https://twitter.com/coecke/status/1450393276918468611

Which I interpret to be something like the content of this talk:

(QNLP20) Bob Coecke and Vincent Wang: Redrawing grammar as compositional
circuits
https://www.youtube.com/watch?v=XFR14CdsLp4

And to the extent I understand that, they appear to reduce quantum
formalism to a circuit, which is to say a network, and then potentially
group together the network on an ad-hoc basis, without the need for fixed
formalism, and indeed fixed "primitives", top down.

Which sounds to me similar to my suggestion for ad-hoc clusterings of
networks. Though as far as I can see, still lacking an actual principle of
clustering. But at least conceding the clusterings can be, indeed must
be, ad-hoc, and maybe growing, expanding.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0f3dcf7070b3a18e-Md7c6778830275a91bee89de7
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] AGi Discussion Forum sessions -- semantic primitives (Mar 18) and formalization of MeTTa (April 8)

2022-03-14 Thread Rob Freeman
On Mon, Mar 14, 2022 at 9:18 PM Ben Goertzel  wrote:

> ...
> Well I am working pragmatically with the notion that the meaning of
> concept C to mind M is the set of patterns associated with C in M.


I like your pattern based conception of meaning. Always have. It's a great
improvement on symbolic meaning.

But I came to conclude that patterns don't tell the whole story. The fuller
story is that patterns can grow. Glider guns, and all that.
In comparison with you, then, you might say I've moved to a notion that:

The meaning of concept C to mind M is... the grouping a set of patterns in
a certain way, which you can choose to call C if you wish, but it doesn't
necessarily need a label, because you can generate it at will, and the
exact set you get may differ according to the context M finds itself in.

(That "way" of grouping typically in linguistics has been according to
shared contexts. Transformers work basically the same way. But they try to
label all the "C" too.)

One way to see the contrast between the static set model and the actively
grouping set model is to that made by Romain Brette, who contrasts
"representational" vs. "structural" models. Roughly "representational"
models are sets, while "structural" models are assembled according to
grouping principles:

http://romainbrette.fr/perceptual-invariants-representational-vs-structural-theories/

Brette's contrast is also in line with some views of the Active Inference
line of thought, for whatever that's worth, e.g. Maxwell Ramstead:

"The traditional view of cognitive processing ... is that the brain is
essentially an aggregative, bottom up, feature detector ... the more
sensory areas which are supposed to be lower down on the processing
hierarchy ... essentially detect features ... "

"What we think is that this is maybe just the wrong way to look at the
problem. ... The predictive processing view or the active inference view
flips this on its head. And says OK well maybe this top down thing is what
the brain is mainly engaged in."

http://www.youtube.com/watch?v=WzFQzFZiwzk=17m58s

There are other versions of this approach elsewhere. How many do you want?

Encapsulate Goedel incompleteness by creating an entirely new sense for
logic and negation if you like. Sounds like a basis transformation to me.

Sounds to me similar to the way Category Theory embraces Goedel's theorem
by building a basis for mathematics in variability. Which is the line I see
Coecke taking. So Coecke's "Togetherness" might equate to your
"paraconsistent logic".

Ought to be any number of ways you can formalize them top-down.

But if the sets are generated, it will always be easier to generate them
bottom up. We're already doing it. We just need to stop being surprised
when we get an infinite number, and that they contradict.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0f3dcf7070b3a18e-Mb6da0dcf0b676d2eb2588841
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] AGi Discussion Forum sessions -- semantic primitives (Mar 18) and formalization of MeTTa (April 8)

2022-03-14 Thread Rob Freeman
On Mon, Mar 14, 2022 at 4:47 PM Ben Goertzel  wrote:

> Whether and in what sense semantic primitives can be found depends
> wholly on the definitions involved right?
>
> Crudely, define ps(p,e) as the number of primitives that is needed to
> generate p% of human concepts within error e
>

That's one way you could define the problem.

But what assumptions are you baking into that?

Are you assuming that the set of human concepts is finite? Are you assuming
that human concepts do not contradict? Are you assuming it is meaningful to
measure an "error" of concept e? Are you assuming you can even count
concepts? Doesn't counting incorporate an assumption of equivalence, and
error?

Is it really useful to make such abstract assertions about meanings, and
then work backwards to decide what they have to be to fulfil those
assertions?

Let's look instead at definitions of meaning which have proven useful. Then
we don't have to make abstract assertions about them.

For a mathematical sense of meaning we have Goedel's proof that any
sufficiently powerful system will be incomplete.

That's maths.

My history comes from a linguistic sense of meaning. In the linguistic
domain I find evidence that primitives derived for linguistic meaning,
highly successful ones, constructing phonemes etc, contradict.

So that's two senses of "meaning", actual useful senses, where I say there
is evidence that semantic primitives do not exist: mathematical, and
(structural) linguistic.

I think Coecke is broadly speaking coming at linguistic meaning from a
Category Theoretic sense. So that might be seen as a marriage of the two.
Though I personally don't think the mathematical perspective is the most
useful one.

The linguistic perspective is bottom up. And I think far more suggestive of
practical solutions.

I don't know what other evidence you want that semantic primitives can't be
found. I would argue that, given eyes to see it, the insight that semantic
primitives don't exist has already been the single largest assertion of
philosophy, for centuries now.

Stephen Hicks traces it to Kant:

"In the history of philosophy, Kant marks a fundamental shift from
objectivity as the standard to subjectivity as the standard."
https://www.stephenhicks.org/2010/01/19/why-kant-is-the-turning-point-ep/

Though philosophers are kind of lost. They are full of it, Heidegger,
Wittgenstein, Hegel(?)... Basically all philosophers really. Russell came
across it while trying to find primitives for maths... But philosophers,
because they are trapped within thinking itself, can't actually find a
solution to this problem. So they tie themselves in knots (and society,
while they're about it...)

We in AI have the advantage. We can express our theories in physicality. We
can show how meaning can emerge, even if it contradicts. We can offer them
a solution.

But we have to accept the problem before we can offer the solution!

Actually the "problem" is a solution! It's a feature not a bug. It actually
explains a lot. It gives us an explanation for things like consciousness
(expansion: larger than itself), freewill (larger than what it starts
with), creativity (getting larger again.)

If we wanted semantic primitives to exist we'd have to give up all these
things! We'd have to still be puzzled by consciousness, and freewill, and
creativity... Oh wait. We are...

It's not even that hard to find this expanding structure. They're actually
falling out of the woodwork all around us. We get too much structure. GPT-3
1.75B parameters. And it contradicts. WTF?!!

Potentially all we need to do is accept this profusion of contradictory
structure our "learning" procedures give us is the expansion that it seems
to be. We can go on finding meaningful structure the same way. Only not be
puzzled it doesn't reduce to primitives anymore, which it never did!

But, by all means. Barge on. Seek semantic primitives. Start with a
mathematical assertion and work backwards to the way the world ought to be.
Who needs a solution for freewill, consciousness, creativity, one-shot
learning... If semantic primitives suit your mathematical conception of how
the problem ought to be formulated, then I'm sure that's the way the world
should be, even if it's not!

Just quietly in the wings here, amused to see the topic of semantic
primitives come up again.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0f3dcf7070b3a18e-Md6f20b5b9dfba408d4d0e91c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] AGi Discussion Forum sessions -- semantic primitives (Mar 18) and formalization of MeTTa (April 8)

2022-03-14 Thread Rob Freeman
In my presentation at AGI-21 last year I argued that semantic primitives
could not be found. That in fact "meaning", most evidently by the
historical best metrics from linguistics, appears to display a kind of
quantum indeterminacy:

Vector Parser - Cognition a compression or expansion of the world? - AGI-21
Contributed Talks
https://youtu.be/0FmOblTl26Q

I said I was glad that this no appeared to no longer be an entirely cracked
suggestion, and that finally others were commenting along similar lines.
For example, Bob Coecke:

>From quantum foundations via natural language meaning to a theory of
everything
Bob Coecke
https://arxiv.org/abs/1602.07618

I even cited comments by Linas Vepstas within OpenCog as finally
recognizing issues along these lines:

Vepstas, “Mereology”, 2020: "In the remaining chapters, the sheaf
construction will be used as a tool to create A(G)I representations o
f reality. Whether the constructed network is an accurate representation of
reality is undecidable, and this is true even in a narrow,
formal, sense."

On Mon, Mar 14, 2022 at 2:37 PM Ben Goertzel  wrote:

> The next couple AGI Discussion Forum sessions:
> 
> https://wiki.opencog.org/w/AGI_Discussion_Forum#Sessions
> 
> March 18, 2022, 7AM-8:30AM Pacific time: Ben Goertzel leading
> discussion on semantic primitives ,
> https://singularitynet.zoom.us/my/benbot . Background:
> https://bengoertzel.substack.com/p/can-all-human-concepts-be-reduced?s=w
> 
> April 8, 2022, 7AM-8:30AM Pacific time: Jonathan Warrell on "A
> meta-probabilistic-programming language for bisimulation of
> probabilistic and non-well-founded type systems" (aka an elegant
> general math formulation underlying MeTTa language) ...
> https://singularitynet.zoom.us/my/benbot . Background material: To be
> posted
> 
> -- ben
> 
> --
> Ben Goertzel, PhD
> b...@goertzel.org
> 
> "My humanity is a constant self-overcoming" -- Friedrich Nietzsche

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0f3dcf7070b3a18e-Mbfa90201f0501ef181ceaf09
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] All Compression is Lossy, More or Less

2021-11-13 Thread Rob Freeman
Erratum: *"Even OpenAI has embraced this idea to an extent. As I cite in my
talk"

Sorry, that should read OpenCog. I don't think OpenAI has embraced it. It
would be nice if they did.

On Sun, Nov 14, 2021 at 7:52 AM Rob Freeman 
wrote:

> Hi John,
>
> I probably should have read this thread earlier.
>
> I agree with your insight. I have been pushing this idea that cognition,
> or at least specifically natural language grammar, is lossy, for some time
> now. Matt Mahoney may remember me pushing it re. the Hutter Prize to
> compress language, when that came out.
>
> And yes, this relates to the idea that "true and false don't purely exist
> as crisp booleans". Which actually has become a big theme in philosophy,
> and is tearing society apart right now.
>
> But I suggest a re-brand. More recently I've started expressing it not so
> much as the idea that cognition is lossy, but more that cognition is an
> expansion.
>
> If you think of cognition as an expansion I think you'll get most of the
> lossy compression insight you are seeing. In short, if cognition is an
> expansion, details matter.
>
> There is now a handful of work which I think you can interpret this way:
>
> Tomas Mikolov - "We can design systems where complexity seems to be
> growing".
> Bob Coecke - "Togetherness". And a thread of quantum cognition emphasizing
> subjectivity of category. Which is maybe not quite expansion, but it has
> the rejection of abstraction aspect.
>
> And of course I have such a model which I presented most recently at
> AGI-21:
>
> Vector Parser - Cognition a compression or expansion of the world? -
> AGI-21 Contributed Talks
> https://youtu.be/0FmOblTl26Q
>
> Even OpenAI has embraced this idea to an extent. As I cite in my talk:
>
> Vepstas, “Mereology”, 2020: "In the remaining chapters, the sheaf
> construction will be used as a tool to create A(G)I representations of
> reality. Whether the constructed network is an accurate representation of
> reality is undecidable, and this is true even in a narrow, formal, sense."
>
> Technically, to avoid arguments about what is lossless and what not, I
> suggest you focus on the decidability result.
>
> Personally, as I describe in my talk, I think it simplifies AI
> tremendously. Roughly comparable to taking all the stuff we have now, but
> turning it upside down. At which point it ceases to be a lot of confusing
> detail, but becomes instead some rather nice, compact, productive
> principles.
>
> Which is nice and inclusive. Because it means that nothing which has been
> done in AI up to this point is really wrong. We've just been interpreting
> it wrong. We can use most of it. And don't need to do a lot of work
> starting from scratch.
>
> But it does mean we need to change the way we think about the problem.
>
> -Rob
>
> On Thu, Nov 4, 2021 at 11:50 PM John Rose  wrote:
>
>> While performing thought experiments on an AGI model I realized that
>> there is no purely lossless compression. Something is always lost. For most
>> practical purposes yes lossless exists. This might sound trivially obvious
>> and non-obvious but it does impact the theory in the model.
>>
>> In other words, I could not imagine any purely lossless compression, it
>> might physically exist I just can't imagine it as I'm not a physicist. So
>> maybe it does exist? or perhaps we just prefer it to be so... I suppose
>> it's the same as saying true and false don't purely exist as crisp
>> booleans. And, exists doesn’t purely exist…so everything is relative. But
>> the implications are enormous when dealing with chaotic and complex systems
>> models. Thus it being trivially obvious and trivially non-obvious or,
>> non-trivially non-obvious... or...
>>
>> Net effect? Zero. Oh wait zero doesn't fully exist now does it. WTH?
>>
>> https://www.youtube.com/watch?v=JwZwkk7q25I
>>
>>
>> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
>> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
>> participants <https://agi.topicbox.com/groups/agi/members> +
>> delivery options <https://agi.topicbox.com/groups/agi/subscription>
>> Permalink
>> <https://agi.topicbox.com/groups/agi/T5ff6237e11d945fb-Mce05fa9f1ab04ee9cd87e46f>
>>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5ff6237e11d945fb-M7ee24374cc77dabf9bd52fdf
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] All Compression is Lossy, More or Less

2021-11-13 Thread Rob Freeman
Hi John,

I probably should have read this thread earlier.

I agree with your insight. I have been pushing this idea that cognition, or
at least specifically natural language grammar, is lossy, for some time
now. Matt Mahoney may remember me pushing it re. the Hutter Prize to
compress language, when that came out.

And yes, this relates to the idea that "true and false don't purely exist
as crisp booleans". Which actually has become a big theme in philosophy,
and is tearing society apart right now.

But I suggest a re-brand. More recently I've started expressing it not so
much as the idea that cognition is lossy, but more that cognition is an
expansion.

If you think of cognition as an expansion I think you'll get most of the
lossy compression insight you are seeing. In short, if cognition is an
expansion, details matter.

There is now a handful of work which I think you can interpret this way:

Tomas Mikolov - "We can design systems where complexity seems to be
growing".
Bob Coecke - "Togetherness". And a thread of quantum cognition emphasizing
subjectivity of category. Which is maybe not quite expansion, but it has
the rejection of abstraction aspect.

And of course I have such a model which I presented most recently at AGI-21:

Vector Parser - Cognition a compression or expansion of the world? - AGI-21
Contributed Talks
https://youtu.be/0FmOblTl26Q

Even OpenAI has embraced this idea to an extent. As I cite in my talk:

Vepstas, “Mereology”, 2020: "In the remaining chapters, the sheaf
construction will be used as a tool to create A(G)I representations of
reality. Whether the constructed network is an accurate representation of
reality is undecidable, and this is true even in a narrow, formal, sense."

Technically, to avoid arguments about what is lossless and what not, I
suggest you focus on the decidability result.

Personally, as I describe in my talk, I think it simplifies AI
tremendously. Roughly comparable to taking all the stuff we have now, but
turning it upside down. At which point it ceases to be a lot of confusing
detail, but becomes instead some rather nice, compact, productive
principles.

Which is nice and inclusive. Because it means that nothing which has been
done in AI up to this point is really wrong. We've just been interpreting
it wrong. We can use most of it. And don't need to do a lot of work
starting from scratch.

But it does mean we need to change the way we think about the problem.

-Rob

On Thu, Nov 4, 2021 at 11:50 PM John Rose  wrote:

> While performing thought experiments on an AGI model I realized that there
> is no purely lossless compression. Something is always lost. For most
> practical purposes yes lossless exists. This might sound trivially obvious
> and non-obvious but it does impact the theory in the model.
>
> In other words, I could not imagine any purely lossless compression, it
> might physically exist I just can't imagine it as I'm not a physicist. So
> maybe it does exist? or perhaps we just prefer it to be so... I suppose
> it's the same as saying true and false don't purely exist as crisp
> booleans. And, exists doesn’t purely exist…so everything is relative. But
> the implications are enormous when dealing with chaotic and complex systems
> models. Thus it being trivially obvious and trivially non-obvious or,
> non-trivially non-obvious... or...
>
> Net effect? Zero. Oh wait zero doesn't fully exist now does it. WTH?
>
> https://www.youtube.com/watch?v=JwZwkk7q25I
>
>
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5ff6237e11d945fb-M830a1567208dc742087a400d
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Meta Type Talk (Hyperon) language description online, w/ talk coming Fri at AGI-21

2021-10-14 Thread Rob Freeman
On Fri, Oct 15, 2021 at 5:19 AM Ben Goertzel  wrote:

> ...
> ... Metta is also a Pali
> word for lovingkindness, which has some AGI ethics resonance.
>

You've led me an etymological dance, Ben:

(https://www.wisdomlib.org/definition/metta)
"Metta (मेत्त) in the Prakrit language is related to the Sanskrit word:
Mātra.
"Prakrit is an ancient language closely associated with both Pali and
Sanskrit."

(https://www.wisdomlib.org/definition/matra#sanskrit)
Sanskrit dictionary:
Mātrā (मात्रा).—1 A measure; see मात्रम् (mātram) above.
...
2) A standard of measure, standard, rule.

3) The correct measure; तस्य मात्रा यते (tasya mātrā na vidyate)
Mb.13.93.45.

4) A unit of measure, a foot.

5) A moment.

6) A particle, an atom
...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tfea23e306539da49-M2a8054190c60f6f919f7e300
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] UNDERSTANDING -- Part I -- the Survey, online discussion: Sunday 10 a.m. Pacific Time, evening in Europe, you are invited

2021-09-11 Thread Rob Freeman
On Sun, Sep 12, 2021 at 12:31 PM Mike Archbold  wrote:

> here's a few
>
> https://understand.ai/
>
>
> https://www.forbes.com/sites/cognitiveworld/2020/06/28/machines-that-can-understand-human-speech-the-conversational-pattern-of-ai/
>
>
> https://www.forbes.com/sites/cognitiveworld/2020/06/28/machines-that-can-understand-human-speech-the-conversational-pattern-of-ai/
>
>
> https://www.engadget.com/2019-12-04-mit-adept-ai-understands-physics-intuitively.html


The MIT one is interesting. It is mapping a visual scene to some kind of
world model, with laws of physics governing how objects interact.

Still a mapping. But a mapping to rules/laws.

As I recall Dileep George's company, Vicarious, also had a world model... I
think it came down to a model for the way the world can distort. Also a
mapping to rules/laws. I think it helped their object recognition by
generalizing. Two images could be distortions of the same thing.

I believe they labeled that "imagination"!

Both of those are also mappings to categories or simple operations as a
sense of "understanding". But they are slightly more general in the sense
they map to rules/laws instead of fixed patterns.

The closest thing I've found to "understanding" in the sense I use, is not
in AI, but in the reverse engineering field. Finding malicious stuff in
code for the govt:

Christopher Domas The future of RE Dynamic Binary Visualization
https://www.youtube.com/watch?v=4bM3Gut1hIk

I don't think he talks about "understanding". But he is talking about the
need to move from a fixed set of interpretations, even a fixed set of
rules, to something which generates new patterns. What he calls "Dynamic
... Visualization".

That's the tweak on "understanding" I am pushing, too. (Hidden in my
definition by a dynamic interpretation of the word "organization".)

Maybe I should specify the novelty in mine as "A re-organization of
information."

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2ee04a3eb9a964b5-Me87b21c1dcd9ade2fbe66bef
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] UNDERSTANDING -- Part I -- the Survey, online discussion: Sunday 10 a.m. Pacific Time, evening in Europe, you are invited

2021-09-11 Thread Rob Freeman
On Sun, Sep 12, 2021 at 7:37 AM Mike Archbold  wrote:

> ...
> The reality is that nobody claims their machine is conscious  -- but
> regularly people claim their machine understands, but they don't say
> what that means


Got any examples of people saying their machine understands Mike? I don't
doubt you are right. But I'm curious for concrete examples.

In more ambitious discussion groups like this it may be common.

For state of the art vision I would guess people use the word "recognize"
more often.

Maybe some smart speaker type systems say their system "understands".

In the deep learning context it wouldn't be hard to trace such a claim of
"understanding" back to an assumption that "understanding" assigns a
category, or maps to simple operations, like operations of rules. For
instance when you say "OK Google, remind me..."

Just because it is easy to guess that this is what is meant, does not mean
the question you are asking is not a good question. It makes the assumption
explicit, and causes us to speculate if "a mapping to fixed forms or rules"
is the only possible assumption.

But there may be other examples of people claiming their system
"understands". Any more concrete examples?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2ee04a3eb9a964b5-Mf982c04f9eb9faae061467a0
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] UNDERSTANDING -- Part I -- the Survey, online discussion: Sunday 10 a.m. Pacific Time, evening in Europe, you are invited

2021-09-10 Thread Rob Freeman
On Sat, Sep 11, 2021 at 2:25 PM  wrote:

> I can pack all my AI mechanisms down to 1 word, all like 16 of them. Never
> seen anyone do much of this.
>

What's the word?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2ee04a3eb9a964b5-M1b52a2c130ed8bb320a06503
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] UNDERSTANDING -- Part I -- the Survey, online discussion: Sunday 10 a.m. Pacific Time, evening in Europe, you are invited

2021-09-10 Thread Rob Freeman
On Sat, Sep 11, 2021 at 12:39 PM Matt Mahoney 
wrote:

> I don't understand why we are so hung up on the definition of
> understanding. I think this is like the old debate over whether machines
> could think. Can submarines swim?
>

It's just shorthand for the continued failure of machines at any number of
human tasks, Matt. Like last mile self-driving, or responding to language
more usefully than Alexa or Siri's brittle one liners.

When we figure out why, you can use any word you like for it.

I agree about the length of definitions though. If you can't pack something
useful down to one or two sentences, there is probably nothing there.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2ee04a3eb9a964b5-Md7fd01b2c1bef0243c72ddf6
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: AGI discussion group, Sep 10 7AM Pacific: Characterizing and Implementing Human-Like Consciousness

2021-09-10 Thread Rob Freeman
On Fri, Sep 10, 2021 at 2:59 PM Ben Goertzel via AGI 
wrote:

> ah yes these are very familiar. materials!  ;)
>
> Linas Vepstas and I have been batting around Coecke's papers for an
> awfully long time now...


Good. I know I mentioned it to Linas in 2019, and possibly even 2010, but I
didn't know if he was keeping up.

I like Coecke's work because it is pushing awareness of indeterminacy of
symbolic representation into the mainstream.

Though I believe any formalization of the symbolic dynamics is the wrong
way to do it, of course. I think you know that.

I'm sure Coecke's exact tech too will prove pointless in the end. Why
squeeze enormous indeterminacy in by overloading symbols, and then sweat
enormous computing power to try and pick that same indeterminacy apart
again? What is the use of such symbols if they are emergent on the data
anyway? But if you insist on a symbolic representation, quantum mechanical
or category theoretic is probably the form you need.

No point replicating work which is already being done.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5e30c339c3bfa713-Me1c5062c860c06505c3c21a4
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: AGI discussion group, Sep 10 7AM Pacific: Characterizing and Implementing Human-Like Consciousness

2021-09-09 Thread Rob Freeman
On Fri, Sep 10, 2021 at 2:36 PM Ben Goertzel via AGI 
wrote:

> ...
> Working out the specifics of the Curry-Howard mapping from MeTTa to
> intuitionistic logics, and from there to categorial semantics, is one
> of the things on our plate for the next couple months


Ah, if that is to be worked out then there are a couple of threads of work
which might be useful. I don't recall if I've mentioned them to you
anywhere:

>From quantum foundations via natural language meaning to a theory of
everything
Bob Coecke
https://arxiv.org/abs/1602.07618

Bartosz Milewski
Category Theory 1.1: Motivation and Philosophy
Motivation and philosophy
https://www.youtube.com/watch?time_continue=22=I8LbkfSSR58=emb_logo

The second appears to be the start of a lecture series on the relevance of
CT for the design of programming languages.

The first proposes specifically category theoretic formal formulations for
AI/concepts. Starting with natural language. Coecke wraps meaning up in a
kind of CT formalism, and then expects they will need quantum computing to
disambiguate it all. Fold it in using CT, fold it out using quantum
computing.

Coecke's group appears to be getting good funding around Oxford.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5e30c339c3bfa713-M950a2b7cfd1b28a60b3739df
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: AGI discussion group, Sep 10 7AM Pacific: Characterizing and Implementing Human-Like Consciousness

2021-09-09 Thread Rob Freeman
On Fri, Sep 10, 2021 at 1:49 PM Ben Goertzel via AGI 
wrote:

> ...
> Our OpenCog/SNet team is spending a lot of time on down-to-earth
> stuff, some of which we'll talk about in some future AGI Discussion
> sessions
>
> Mainly
>
> -- design of a new programming language (MeTTA = Meta Type Talk)
> designed to serve as the type system and formalism behind the new
> version of OpenCog Atomspace.  A lot of math/CS work has been done in
> the last 9 months leading up to this design...
>

Meta Type Talk...

Is it a fair guess that there will be an element of Category Theory,
sheaves, and the like, in MTT?

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5e30c339c3bfa713-M1c3c945a62a083fdbefa6e18
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: Advanced robots

2021-08-27 Thread Rob Freeman
On Sat, Aug 28, 2021 at 4:22 AM Ben Goertzel  wrote:

> Matt, "Quantum Associative Memory" is an active research area...
>
> So are reversible NNs, e.g. https://arxiv.org/abs/2108.05862
>
> I think your current view that "learning means writing bits into
> memory." is overly limited...


And that's just a memory model?

There is also Bob Coecke's group around Oxford who are looking at quantum
perspectives which change the conceptualization of the intelligence problem
from the current one of "learning" parameters, entirely, to something he
contrasts as "togetherness". So more active, dynamic, something which
embraces contradictions inherently within itself, e.g:

>From quantum foundations via natural language meaning to a theory of
everything
Bob Coecke
https://arxiv.org/abs/1602.07618

So moving right away from the idea of intelligence as some kind of
"learned" weight adjustment system, in the first place.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tf6dddebe1e89183a-M6b44cf935242b95cd7a2270b
Delivery options: https://agi.topicbox.com/groups/agi/subscription


[agi] An uncertainty principle for deep learning?

2020-09-01 Thread Rob Freeman
This came up on Twitter:

Deep Learning’s Uncertainty Principle
Carlos E. Perez
https://medium.com/intuitionmachine/deep-learnings-uncertainty-principle-13f3ffdd15ce

An uncertainty principle for grammar. What I've been arguing for 20 years!

Posting it here now, because to me it appears to be the argument I was
making here last year with Linas Vepstas about the learnability of grammar.
Linas had written a paper trying to explain how networks were different to
grammar learning. He observed in his own paper his representations had
"gauge", but dismissed it as unimportant. He concluded learning grammar had
equivalent power to network representation, so work should proceed by
learning grammar alone.

I argue the success of DL has been because it captures part of this
"uncertainty principle" duality.

And that where DL fails, and what we are still missing, is that to fully
capture such duality we need to make the network generative/dynamic. Which
is what I have been arguing here again more recently: the parameters
contradict, and the number of them expands.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T1b8c0c3b7933a51f-M8a7fd5da725aae0d02980c40
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Rob Freeman
On Sun, Aug 2, 2020 at 1:58 AM Ben Goertzel  wrote:

> ...
> ...I also think that the search for concise
> abstract models is another part of what's needed...
>

It depends how you define "concise abstract model". Even maths has an
aspect of contradiction. What does Chaitin call his measure of randomness
in maths..? Chaitin's Omega. And he sees this random character correctly as
a power. A feature not a bug.

Chaitin: "Incompleteness is the first step toward a mathematical theory of
creativity, a bridge between pure mathematics and biology."

http://inference-review.com/article/an-algorithmic-god

So maths has this contradictory truth aspect too. But once you resolve that
with axioms, maths is concise and abstract, yes. You can build a concise
abstract maths on top of that. And sure, it's desirable to do that.


> And I don't think GPT3 is doing this "constantly unpacking structure
> by combining observations, billions of features of it." in the right
> way for AGI ...


No, they're not doing it in the right way. And yes, they will need to
relate it to sensory data to ground words in their turn, in the sensory
world.
That might be closer to the grounding you want.

In particular I don't think string transformations will translate to a
dynamical model. There might be some trick. But likely they'll need to
change their relational principles again. Difficult to find energy minima
of string transformations in real time.

I think the right structuring model will turn out to be causal invariants,
revealed by setting a sequence network oscillating. That should be dead
simple. And tantalisingly suggesting a role for oscillations to match the
observation of "binding by synchrony."

But nobody has ears for that. String transformations are currently where
it's at.


> honestly Rob your own ideas feel way closer to the
> mark...


Thanks. And yes I don't think OpenAI are theoretically close to what I'm
suggesting. You're right.

At best I think they are backing into it, blindly, while looking the other
way. By accident, without theory. They don't have a theory of meaning. They
stumbled into string transformations while trying to enhance RNNs. Now
they're stumbling into more and more parameters, without knowing why they
need to.

But if that's where they are, that's the context I have to talk to. I have
to find what I can of value in that and try to show how I believe it can be
better.

What I can find of value is evidence that the more parameters you generate
the better your model becomes - nobody knows why, I say resolution of
contradiction in context. And secondly that those parameters can be easily
generated from principled relationships between parts - even though they
think you have to list them all in advance.

My own theory is that meaning contradicts. That grasping this has been the
problem for 50 years. (Caused Chomsky to reject learned grammar in the '50s
for instance, and redirect linguistics in the direction of innate
parameters for 60 years!) And to properly capture this contradictory
meaning in context we must constantly generate meaning/parameters from
simple meaningful relationships between parts: causal invariants, string
permutations, transformations... But constantly generating it. A dynamical
system.

Unfortunately the gap between that and the current state-of-the-art is too
great. Unless you can find 9/10 overlap with what people are already doing,
they won't listen. So my best bet is to suggest how folks working on stuff
like GPT-3 might improve what they've done.

My answer is that they might think of calculating their "parameters" in
real time. Not trying to enumerate all infinitely billions of them
beforehand.

But what they are doing is not "right". It's lame that they blow 12M trying
to list every possible meaningful construction in advance. But currently
GPT-3, with its 175 billion parameters and growing, is the best evidence I
have that we need to embrace contradiction, and calculate
meaningful parameters all the time, in a "symbolic network dynamics", as
we've discussed before.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M3eb3c66fee749efaf83a3994
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Rob Freeman
On Sat, Aug 1, 2020 at 7:08 PM Matt Mahoney  wrote:

>
> On Fri, Jul 31, 2020, 10:00 PM Ben Goertzel  wrote:
>
>> I think "mechanisms for how to predict the next word" is the wrong
>> level at which to think about the problem, if AGI is your interest...
>>
>
> Exactly. The problem is to predict the next bit. I mean my interest is in
> compression, but you still have to solve high level language modeling.
> Compression is not well suited to solving other aspects of AGI like
> robotics and vision where the signal is dominated by noise.
>

It doesn't matter if predicting the next word is the right level to think
about a given problem Matt. What matters is that this is the first time the
symbol grounding problem has been solved for any subset of cognition. For
any problem. This is the first.

I'm not sure Ben thinks it has been solved. He seems to think words are
still detached from their meaning in some important way. I disagree. I
think these GPT-x features are attaching words to meaning.

Perhaps we need a more powerful representation for that meaning. Something
like a hypergraph no doubt. Something that will be populated by relating
text to richer sensory experience, surely. But the grounding is being done,
and this shows us how it can be done. How symbols can be related to
observation.

That's a big thing. And it is also a big thing that the way it has been
solved is by using billions of parameters calculated from simple relational
principles. So not solved by finding some small Holy Grail set of
parameters in one-to-one correspondence with the world in some way, but by
billions of simple parameters formed by combining observations. And
seemingly no limit to how many you need. It matters that it turned out
there appears to be no limit on the number of useful parameters. And it
matters that these limitless numbers of parameters can be calculated from
simple relational principles.

This suggests that the solution to the grounding problem is firstly through
limitless numbers of parameters which can resolve contradictions through
context. But importantly that these limitless numbers of parameters can be
calculated from simple relational principles.

Given this insight it can open the door to symbol grounding for all kinds
of cognitive structures. Personally I think causal invariance will be a big
one. The solution for language it would seem. Grammar, anyway. I think for
vision too. But there may be others. Different forms of analogy I don't
doubt. But all grounded in limitless numbers of parameters which can
resolve contradictions through context. And those limitless numbers of
parameters all calculated from simple relational principles.

Another way to look at this is to say it suggests to us the solution to the
symbol grounding problem turned out to be an expansion on observation, not
a compression.

You can go on thinking the solution is to find some sanctified Holy Grail
small set of parameters. A God given kernel of cognition. But meanwhile
what is working is just constantly unpacking structure by combining
observations, billions of features of it. The number is the thing. More
than we imagined. And contradicting but resolved in context. Moving first
to networks, then to more and more parameters over networks. That is what
is actually working. Allowing the network to blow out and generate more and
more billions of parameters, which can resolve contradiction with context.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Md4d2a1a723ce7c2afad4db23
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Rob Freeman
How many billion parameters do PLN and TLCG have?

Applications of category theory by Coecke, Sadrzadeh, Clark and others in
the '00s are probably also formally correct.

As were applications of the maths of quantum mechanics. Formally. Does
Dominic Widdows still have that conference? Indeterminacy formalized in
different ways.

But saying stuff is fuzzy, top down, is not the same as resolving that
fuzziness, bottom up.

Natural language grammars drifted to probabilistic frameworks in the '90s.
Never worked worth a damn. That's why Bengio moved to NLM's in the first
place (2003?) Networks, better because... more parameters?

Now, only 20 years later... GPT-3 suggests learning parameters even in the
billions seems to improve things even more. Insight! It seems like there's
no limit to how many parameters are useful. Only problem is processing time
to list them all beforehand It is obvious that you need to list them
all beforehand... Right?

You can imagine how OpenAI, or some other team, may finally stumble on the
answer. They'll be pinched for budget, and finally decide it is stupid to
spend $12M calculating 175 billion parameters in advance. When most of them
will never be used. At some point some bright spark will suggest they
calculate relevant parameters at run time.

When they do that they may find they stop worrying about how many
parameters they have. Because they will just be focused on finding
the parameters they need, when they need them.

The actual parameters will be found by some form of existing known
meaningful relationship: permutation, transform, causal invariance... Easy
stuff. Done now. Just the insight will be to do it at run time so they
don't need to worry about how many there are (they may even be surprised to
find they need to be able to contradict.)

>From that point symbol binding will be solved, and they'll be free to build
any kind of logic they want on top.

-R

On Sat, Aug 1, 2020 at 3:38 PM Ben Goertzel  wrote:

> Contradictions are an interesting and important topic...
>
> PLN logic is paraconsistent, which Curry-Howard-corresponds to a sort
> of gradual typing
>
> Intuitionistic logic  maps into Type Logical Categorial Grammar (TLCG)
> and such; paraconsistent logic would map into a variant of TLCG in
> which there could be statements with multiple contradictory
> parses/interpretations
>
> In short formal grammar is not antithetical to contradictions at the
> level of syntax, semantics or pragmatics
>
> It is true that GPT3 can capture contradictory and ambiguous aspects
> of language.  However, capturing these without properly drawing
> connections btw abstract patterns and concrete instances, doesn't get
> you very far and isn't particularly a great direction IMO
>
> ben

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M6fd5ed33f971d3ffc5e24cf9
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Rob Freeman
Ben,

By examples do you mean like array reversal in your article?

I agree. This problem may not be addressed by their learning paradigm at
all.

But I disagree this has been the biggest problem for symbol grounding.

I think the biggest problem for symbol grounding has been ambiguity.
Manifest in language.

So I agree GPT-3 may not be capturing necessary patterns for the kind of
reason used in array reversal etc. But I disagree that this kind of
reasoning has been the biggest problem for symbol grounding.

Where GPT-3 may point the way is by demonstrating a solution to the
ambiguity problem.

That solution may be hidden. They may have stumbled on to the solution
simply by virtue of the fact that they have no theory at all! No
preconceptions.

I would contrast this with traditional grammar learning. Which does have
preconceptions. Traditional grammar learning starts with the preconception
that grammar will not contradict. The GPT-x algorithm may not have this
expectation. So they may be capturing contradictions and indexing them on
context, by accident.

So that's my thesis. The fundamental problem which has been holding us back
for symbol grounding is that meaning can contradict. A solution to this,
even by accident (just because they had no theory at all?) may still point
the way.

And the way it points in my opinion is towards infinite parameters.
"Parameters" constantly being generated (and contradiction is necessary for
that, because you need to be able to interpret data multiple ways in order
to have your parameters constantly grow in number 2^2^2^2)

Grok that problem - contradictions inherent in human meaning - and it will
be a piece of cake to build the particular patterns you need for abstract
reasoning on top of that. Eliza did it decades ago. The problem was it
couldn't handle ambiguity.

-Rob

On Sat, Aug 1, 2020 at 9:40 AM Ben Goertzel  wrote:

> Rob, have you looked at the examples cited in my article, that I
> linked here?   Seeing this particular sort of stupidity from them,
> it's hard to see how these networks would be learning the same sorts
> of "causal invariants" as humans are...
>
> Transformers clearly ARE a full grammar learning architecture, but in
> a non-AGI-ish sense.  They are learning the grammar of the language
> underlying their training corpus, but mixed up in a weird and
> non-human-like way with so many particulars of the corpus.
>
> Humans also learn the grammar of their natural languages mixed up with
> the particulars of the linguistic constructs they've encountered --
> but the "subtle" point (which obviously you are extremely capable to
> grok) is that the mixing-up of abstract grammatical patterns with
> concrete usage patterns in human minds is of a different nature than
> the mixing-up of abstract grammatical patterns with concrete usage
> patterns in GPT3 and other transformer networks.   The human form of
> mixing-up is more amenable to appropriate generalization.
>
> ben

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M89e52548cccd0a556249a9e8
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Rob Freeman
On Sat, Aug 1, 2020 at 3:52 AM  wrote:

> ...
> Semantics:
> If 'cat' and 'dog' both share 50% of the same contexts, then maybe the
> ones they don't share are shared as well. So you see cat ate, cat ran, cat
> ran, cat jumped, cat jumped, cat licked..and dog ate, dog ran, dog ran.
> Therefore, probably the predictions not shared could be shared as well, so
> maybe 'dog jumped' is a good prediction.
>

Agree with you on this. This is the basis of all grammar learning.

I now believe it may equate to "causal invariants" (same contexts), and so
possibly learned by position "transforms" or permutations.

So with "transformers" the RNN guys may have stumbled on a true sense of
meaning.

Added to that, the fact that abandoning the original RNN model made it
possible to learn hierarchy, may mean GPT-3 is now learning "grammar", with
hierarchies and all.

So transformers may equate to a full grammar learning architecture.

It's even possible that because they have no guiding theory, they may be
allowing contradictions in their parameters in some way. That would be a
big thing from my point of view. I think the doctrinal rejection of
contradiction is what is holding back those who formally attempt to learn
grammar. Like OpenCog's own historical grammar learning projects.

If GPT-3 is learning grammatical forms which contradict according to
context, the only remaining problem from the point of view of my dogma
would be that it is trying to learn, not generate, grammar using this
(causal invariant, transformer) principle. I think it needs to generate
these 175+ parameters on the fly, not try to enumerate them all beforehand
at a cost of $12M (Geoff Hinton suggests end the search at 4.398 trillion =
2^42 :-)

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Mee5f7150dd41c23eb94f5620
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Rob Freeman
I was interested to learn that transformers have now completely abandoned
the RNN aspect, and model everything as sequence "transforms" or
re-orderings.

That makes me wonder if some of the theory does not converge on work I like
by Sergio Pissanetzky, which uses permutations of strings to derive
meaningful objects:

"Structural Emergence in Partially Ordered Sets is the Key to Intelligence"
http://sergio.pissanetzky.com/Publications/AGI2011.pdf

Also interesting because Pissanetzky's original motivation was refactoring
code, and one of the most impressive demonstrations to come out of GPT-3
has been the demo which was created to express the "meaning" of natural
language in javascript.

This could give a sense in which transformers are actually stumbling on
true meaning representations.

-Rob

On Sat, Aug 1, 2020 at 3:45 AM Ben Goertzel  wrote:

> What is your justification/reasoning behind saying
>
> "However GPT-3 definitely is close-ish to AGI, many of the mechanisms
> under the illusive hood are AGI mechanisms."
>
> ?
>
> I don't see it that way at all...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Mc9f1fbb6be9850b9bab3d990
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-04 Thread Rob Freeman
On Sat, Jul 4, 2020 at 2:04 PM Ben Goertzel  wrote:

> ...

I believe we discussed some time ago what sort of chaotic dynamical
> model I think would be most interesting to explore in a language
> learning context, and my thoughts were a little different than what
> you're doing, but I haven't had time/resources to pursue that
> direction yet...
>

That's great.

If I may say, I think the fact that you want to pursue this angle is little
known. Given that I know of no-one pursuing it currently. Nada, zero. That
you yourself have not had the time or resources to move down this path at
all. I think it bears repeating that this is something which you think
might be worth checking out.

It would surprise me if anyone on this list even realized you thought this
was a good direction to go.

A chaotic dynamical model for language structure and meaning. With meaning
emerging as chaotic attractors in some kind of network.

Who is doing anything with this? Who has ever looked at it?

I have a candidate model. And it is extremely simple. However it looks like
it will require massive network parallelism. I can't see how you can get
around that. Ideally a spiking network. The problem is naturally parallel.
The lack of such hardware is what has held me back. Something like Intel's
Loihi might be ideal. If anyone has access to that hardware. SpiNNaker
might work too, but the public interfaces are very poor/bureaucratic.

Any other massively parallel, spiking network platforms anyone knows of
with public access?

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T100f708e32ae7327-Me68a702b9df34bfa3139e85e
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-03 Thread Rob Freeman
On Sat, Jul 4, 2020 at 3:28 AM Ben Goertzel  wrote:

> We have indeed found some simple grammars emerging from the attractor
> structure of the dynamics of computer networks, with the grammatical
> forms correlating with network anomalies.   Currently are wondering if
> looking at data from more complex computer networks will yield more
> interesting and abstract grammars... and waiting for a more complex
> dataset from Cisco...
>

So you found grammars which adequately summarize a symbolic dynamics for
Cisco networks, and are still happy with the idea such generalizations will
capture all the important behaviour? You don't think there are some
behaviours of Cisco networks which are only explained at the network level?

A pity. I was hoping a new, starker, contrast between networks and
grammars would give you new insight. Because Cisco networks are not
fundamentally generated by grammars. We know that. So we would not be
surprised to find aspects which do not obey grammar rules. Or perhaps
aspects where they flip from one grammatical abstraction to another.

But by contrast we don't know this for natural language. So we are
constantly surprised when we find aspects of natural language grammar do
not obey grammar rules, or flip from one grammatical abstraction to another.

So the parallel exists there for an insight that natural language structure
might also need to fundamentally be resolved at a network level. That the
network level might be the best level to model its symbolic dynamics too.
So we could flip from one grammatical abstraction to another, as necessary.

If we gained this insight, it would no longer surprise us that the number
of parameters necessary for an adequate grammar continues to explode. Of
course it will. If natural language structure needs to be fundamentally
modeled as a network, then the network will be the smallest representation
for it. "Grammar" will be a production, which might generate essentially
infinite "parameters". GPTx will continue to just get bigger and bigger and
bigger, or resolve poorly, or both.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T100f708e32ae7327-Mf357565fd2859b4971b66226
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] What's With the Anti-AIT Hysteria In Language Modeling?

2020-07-02 Thread Rob Freeman
Ben,

How did the network, symbolic dynamics, work you planned last year work
out? Specifically you said (July 17, 2019):

"...applying grammar induction to languages derived from nonlinear dynamics
of complex systems via symbolic dynamics, is not exactly about artificial
languages, it's about a different sort of "natural language" than the ones
used by humans for communication..."

You wanted to do it in the context of diagnosing problems in conventional
computer networks.

I was interested in it because, then as always, I am advocating that
natural language symbolism must be looked at as a dynamical system, and
can't be abstracted into a grammar, let alone the limited parameters of
GPTx or other.

Did you have any success with symbolic dynamics emerging on its own terms,
without assuming it might be summarized in a grammar?

-Rob

On Thu, Jul 2, 2020 at 1:49 PM Ben Goertzel  wrote:

> ...
> FWIW my preferred approach to grammar induction at present is building
> on this work we did earlier this year,
>
> https://arxiv.org/abs/2005.12533
>
> which is using symbolic grammar induction but guided by a transformer
> NN model used as a sentence probability oracle.   To the extent this
> works well, it will yield a much more concise model than the
> transformer NN model itself, with equal or better predictive power.
>
> However, concision is just one interesting heuristic to me, in this
> context, not the end-all.   The main reason I want a fairly concise
> symbolic grammar is because I know how to do interesting reasoning
> with it ... which is partly because it's a nice symbolic grammar with
> familiar properties, not just because it's concise.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T100f708e32ae7327-M051fc8f54eb0a503d9453c35
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: Google - quantum computers are getting close

2019-10-28 Thread Rob Freeman
On Mon, Oct 28, 2019 at 11:11 AM  wrote:

> Do you mean, instead of feeding the net data and learning, to instead
> request new output data/solutions?
>

You could put it like that.

Without seeing an exact formalization it is hard to say.

You make the example of zebra, horse, dog, mouse, cat. You group them
heterarchically based on sets of shared contexts. (Your innovation seems to
be a more efficient representation for the shared contexts??)

That's OK. But perhaps I can distinguish myself by saying what I do is not
limited to groupings. I don't only group words heterarchically based on
sets of shared contexts. I use the shared contexts to chain words in
different ways.

Saying to look at the way these things chain, might capture what is
different in what I'm suggesting.

Because the groupings are a heterarchy, the patterns possible when you
chain them expand much faster. Perhaps something like the way the number of
representations possible with qbits expands, because each element can be
multiple, so their combination can be exponentially more multiple etc.

Traditionally language has been looked upon as a hierarchy, so we miss this
complexity. That has been the historical failure of linguistics. Deep
learning also looks for hierarchy. There is the potential for heterarchy in
their representations, but as soon as they try to "learn" structure, that
crystallizes just one of them, and the heterarchy is gone or at least
reduced. Such crystallization of a single hierarchy gives us the "deep"
part of deep learning, and it is also the failure of deep learning.

So, I guess I'm saying, yeah, heterarchy, but chain as a heterarchy too.
Which means abandoning the singular, "learned", hierarchies of deep
learning.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta664aad057469d5c-Mfc729479fb75cb9e16f09856
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: Google - quantum computers are getting close

2019-10-27 Thread Rob Freeman
On Mon, Oct 28, 2019 at 1:48 AM  wrote:

> No I meant Word2Vec / Glove. They use a ex. 500 dimensional space to
> relate words to each other. If we look at just 3 dimensions with 10 dots
> (words) we can visualize how a word is in 3 superpositions entangled with
> other dots.
>

Pity. I thought you might be working on it yourself.

The ability to see, if not entanglement, then quantum superposition, as a
property of multi-dimensional word meaning representations, is something
that struck me about 1996. I didn't mention it much because it felt a bit
crackpot, but it was a pleasure to later find others working with the same
insight. I guess you know about the Quantum Interaction conference series
and associated work, e.g:

http://www.newscientist.com/article/mg21128285.900-quantum-minds-why-we-think-like-quarks.html

My current position is that it suggests there may be a more fundamental
description of physics in terms of distributed representations. QM looks
like the observation of some kind of distributed properties at a lower
level. (I later found support for this too. You might look at Robert
Laughlin's work for hints of this in contemporary physics.)

Given that, yes, maybe quantum computers can help with the computation of
language seen as a distributed representation problem. Maybe they can
implicitly use this lower level distributed representation to crunch
language/cognition problems, faster. But if they do I think it will be
because language/cognition is fundamentally a distributed representation,
network, problem, and we need to formulate cognition properly that way
first.

As a first step in that direction, yes, word2vec and GloVe are current
iterations of something which dates back 30 years or more (earlier Latent
Semantic Analysis...)

The world has more closely embraced this distributed representation idea
the last 10 years with the network AI (deep learning) revolution. But as
everyone knows that is mostly 30 year old ideas, made practical by GPU's.

More closely embracing this old network tradition has been good, but there
is still a trick missing.

Since that same 1996 I have thought the missing trick is to treat the
language/cognition problem as one of generating new patterns, rather than
limiting ourselves to learning patterns, in the style of current deep
learning, or indeed word2vec and GloVe. So instead of trying to find
factors in distributed representations the way word2vec and GloVe do,
simply resolving against dimensions as a kind of dot or scalar product (the
basis of deep learning), I suggested we might formulate cognition as
something productive, like a cross or vector product.

I haven't published much on this formally, but I do have a couple of
papers. One 20 years old, and the other a more technical update:

Freeman R. J., Example-based Complexity--Syntax and Semantics as the
Production of Ad-hoc Arrangements of Examples, Proceedings of the ANLP/NAACL
2000 Workshop on Syntactic and Semantic Complexity in Natural Language
Processing Systems, pp. 47-50.
http://www.aclweb.org/anthology/W00-0108

Parsing using a grammar of word association vectors
http://arxiv.org/abs/1403.2152

It works. You get meaningful structure from new patterns. But it seems we
need another level of computing power to make it practicable. Like the
GPU's which made the 30 year old, static, distributed representation ideas
practicable in the deep learning revolution. Quantum computing might
provide that power. But probably just massive parallelism is enough. I'm
waiting for some of these spiking network architectures, like Intel Loihi,
to escape into the wild to try it.

Anyway, yeah, I think you're right. There are strong parallels between
dimensional representations of language/cognition and QM.

Word2vec is a 30+ year old idea. But with a tweak, this kind of
distributed/network formulation may provide our answer.

I believe providing that tweak/trick will be the crucial step though. Maybe
understanding how that "trick", missing in 30+ year old word2vec style
formulations, relates to the power of cognition, will unlock the puzzle of
how to usefully program quantum computers too.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta664aad057469d5c-Mf9ea805562ffbcd12295f174
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: Google - quantum computers are getting close

2019-10-26 Thread Rob Freeman
On Sun, Oct 27, 2019 at 12:13 PM  wrote:

> Better put, a qubit/dot in my 3D cube can be in 3 dimension (or more)
> (superposition)
>

What do you mean you "my 3D cube"?

Perhaps I've missed another post where you talk about your work. Have you
done something using 3D network representations for word meaning?

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta664aad057469d5c-M928cbb2472a11cccef5c902d
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The future of AGI

2019-09-23 Thread Rob Freeman
On Tue, Sep 24, 2019 at 9:34 AM James Bowery  wrote:

> The use of perplexity as model selection criterion seems misguided to me.
> See my Quora answer to the question "What is the relationship between
> perplexity and Kolmogorov complexity?
> "
> I'd say "antiquated" rather than "misguided" but AFAIK Solomonoff's papers
> on the use of Kolmogorov Complexity as universal model selection criterion
> predated perplexity's use as language model selection criterion.
>

Are you addressing my argument here James?

Are you saying Pissanetzky selects models according to perplexity?

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tbc419b4c00dd690d-M6835333a43779b6fa9229f26
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] While you were working on AGI...

2019-07-14 Thread Rob Freeman
You undervalue the degree to which research is an ideas market, Matt.

This entire current AI boom is the result of the one simple, universal
breakthrough. Progress was flat for years before that (winter), and has
been since.

Of course "flat" is relative. The old, single, universal breakthrough of
HMM's were applied with more power, and gradually crept forward in
performance, successively touted as "solved". But it wasn't solved. It
needed a new idea. As it still needs other ideas.

Most development is a matter of having the resources to push ideas forward
it is true. But that doesn't mean ideas don't count.

Google's sudden embrace of neural nets around 2012 was a vindication for
those of us who had argued distributed representation for years. But Google
didn't make it happen. It's only to their credit that they were the first
to grab what others had done once its potential became really impossible to
ignore.

There must be better ways to allocate resources to research. It's certainly
a disgrace that the potential of back-propagation was finally realized only
as something of a side effect, because people were playing a lot of
computer games.

Google were tooling around with Bayesian stats. It's not only cranks
without resources who are wrong all the time.

Google invested resources poorly until the hard work on back-prop had been
done. I well remember the first AI MOOC (2011?) when distributed
representation was not mentioned (for as long as I attended.)

Since video games accidentally opened that door, corporate resources have
dominated again, sure.

But the problem isn't solved. Back-prop has been just one idea. We're
waiting for the next accidental confluence of ideas and resources, to give
those with resources their next cat video moment.

It's a pity we can't move that confluence of ideas and resources forward
more rapidly than the accidental. Or at least optimize the accidental by
making more bets. But as the example of video games and GPU's show, the
right resources are still mostly allocated only by accident. Poor
allocation of resources to ideas is holding up all research, not only AI.
Ideas matter.

-Rob

On Mon, Jul 15, 2019 at 9:00 AM Matt Mahoney 
wrote:

> On Sat, Jul 13, 2019, 6:43 PM Basile Starynkevitch <
> bas...@starynkevitch.net> wrote:
>
>> But you forgot the difference between AI & AGI.
>>
> AGI is lots of narrow AI working together. It's not the simple, universal
> breakthrough you would like to have. It's the one we have to have because
> Legg proved that powerful predictors are necessarily complex.
> https://arxiv.org/abs/cs/0606070
>
> Google OCR and language translation makes mistakes, but it works better
> than last year and will work better next year. There isn't a point in time
> when we will have AGI because you can't compare human and machine
> intelligence.
>
>> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  + delivery
> options  Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tbee4f4d703cc71ad-Maa5e42d54f508fed92f03332
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Steel-manning 101

2019-07-02 Thread Rob Freeman
On Tue, Jul 2, 2019 at 9:28 PM Colin Hales  wrote:

>
> -- Forwarded message -
> From: Rob Freeman ...
> As far as my position, I think the answer is a chaos, or a complex system
> element to meaningful patterns. And that's why they elude us. Chaos is also
> embodied.
>
> C: OK let's keep it simple and focus on this. How does the following
> statement sit with you as a statement of part of your perspective on
> creating an artificial brain:
>
> "Brain tissue is the only thing current empirically known to deliver
> natural general intelligence. Brains are situated in a body. The body is
> situated in a host environment. However natural brains deliver intelligence
> in this context, it is to be expected that an artificial brain, in the
> first instance (and until it is empirically proved optional)  must at least
> replicate this context (embodied, embedded, situated like a natural brain).
>

Well, if I have to steel-man that, I would say we agree embodiment is
important to cognition.

But steel-man is not only agreement. It's actually important in steel-man
to represent where we disagree. But do it in a way you agree with!!

To attempt to steel-man the above on the points we disagree, I would say
you are using embodiment to argue for a given body.

Additionally,  there is chaotic physics measureable in the natural brain
> when it is operating normally in an awake, alert subject. Therefore, if one
> is to create an artificial general intelligence, then at least initially,
> the artificial brain should be based on brain tissue physics and if it is
> delivering intelligence, chaotic behaviour of a similar kind should be
> observed in it, and when the chaotic behaviour changes or ceases,
> intelligent behaviour should be observed degraded or changed or lost in a
> measurable way. Chaos is thereby prima facie, necessary in an artificial
> brain, but on its own, insufficient. It is the natural brain physics-basis
> of the chaos that has to be conserved until empirically proved optional."
>
> Is this statement clashing with you in any way? What does it get
> wrong/right?
>

To steel-man this, my analysis of it has to be something you agree with
again...

I think the same thing. You are using embodiment to argue for a given body.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tafcd787c73d24a40-Mb8e80def4e7cbb56b62bf968
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Steel-manning 101

2019-07-01 Thread Rob Freeman
I'm not sure there is any point in critiquing steel-man Colin. If you don't
agree, it's not steel-man. The idea is to stop talking about yourself, and
try to understand other points of view.

I might just comment my last message originally ended with the following
words, which I edited off as overly negative (though true!):

"In my experience no-one ever accepts any restatement of their position. So
steel-man never applies! But that doesn't mean that attempting it is not a
good exercise for yourself."

As far as my position, I think the answer is a chaos, or a complex system
element to meaningful patterns. And that's why they elude us. Chaos is also
embodied.

Your EM interactions might be embodied. That's what I hear you saying at
root: they can't be copied. But if they are, it would be because they are
chaotic. Probably all embodiment is chaotic at root. That would be common
ground.

But for me it applies at every level. I don't pick out EM interactions and
say, "That's where the chaos is, we need only that bit." I think Lego
bricks might generate the required patterns, if we gave them enough energy.

Even if I didn't see the same essential process of chaos happening in all
kinds of different mediums, I wouldn't think a multi-million dollar effort
to test the hypothesis a particular level of electromagnetic interactions
in the brain is special and essential, would be the first place I would
spend my money.

It's something to test though, if nothing else works. Maybe all
intelligence does depend on one specific type of electromagnetic
interaction only, in one kind of jelly brain. But I see chaos in all kinds
of mediums. I'm guessing the medium is not special.

We had some email a while back where I attempted some common ground in
aspects of embodiment. That's why I know this. You might want to wonder why
you remember nothing of my argument yourself.

-Rob

On Tue, Jul 2, 2019 at 10:51 AM Colin Hales  wrote:

> Rob!
>
> Ported to a new thread for this. The ARGHH! thread has a long way to go
> and best not clutter it up with steel man.
>
> Can I take the trouble to critique your depiction of my position?
>
> Alas, I'm unable to say anything well-informed on your position, so I am
> open to you educating me.
>
> regards
> Colin
>
>
>
>
> On Tue, Jul 2, 2019 at 12:20 PM Rob Freeman 
> wrote:
>
>> On Tue, Jul 2, 2019 at 7:57 AM Colin Hales  wrote:
>>
>>> ...I'd like to do something different this time. We're part of the 'old
>>> guard' and it's up to us to demonstrate how an intellectual discussion can
>>> be fruitfully conducted to advance the topic in question. So I'd like to
>>> run an experiment. I'd like us to 'steel-man' each other. This is where:
>>>
>>> 1) I do my best to express your perspective to you.
>>> 2) You do your best to express my perspective back to me.
>>>
>>> This is the way for differences to be understood in a manner that can be
>>> fruitfully discussed. For what this means, see this video at exactly
>>> 1:57:15 to 1:58:30. It is an answer to a query from the audience at the end
>>> of sam harris's first 'book club'.
>>>
>>> https://www.youtube.com/watch?v=H_5N0N-61Tg
>>>
>>> I think it would be very instructive. Would like like to try?
>>>
>>
>> I think that's a great idea Colin.
>>
>> I think I could do that for most everyone regularly corresponding here:
>>
>> Colin Hales: Cognition is analogue, not digital. The answer is in the
>> physical electromagenetic field effects between elements in the brain.
>> Steve Richfield: The answer is in the detail of neuron behaviour.
>> Peter Voss: "Integrated" symbolism. Symbolism is OK. The answer is we
>> need to build a representation for the meaning of an entire situation.
>> Matt Mahoney: Problem solved. Current neural nets work. We just need to
>> build them bigger.
>> Ben Goertzel: Graphs will do anything.
>>
>> Anyone else wants one, let me know. Mostly variations on the "symbolism
>> was OK, I too am 50% of the way there already", position.
>>
>> I hope that may be "steel-manned" in the sense of "restate the other
>> person's position in a way they would accept".
>>
>> Though perhaps those are not fully "steel-manned". To fully steel-man you
>> might need to leave out too much middle ground. A full steel-man might look
>> more like this:
>>
>> Colin Hales: Obviously there are enormous differences between the brain
>> and a von Neumann computer. We need to explore this
>> NN people: Neural nets find meaningful patterns.
>> Symbolic people: There is a symbolic element to cognition.
>

Re: [agi] ARGH!!!

2019-07-01 Thread Rob Freeman
On Tue, Jul 2, 2019 at 7:57 AM Colin Hales  wrote:

> ...I'd like to do something different this time. We're part of the 'old
> guard' and it's up to us to demonstrate how an intellectual discussion can
> be fruitfully conducted to advance the topic in question. So I'd like to
> run an experiment. I'd like us to 'steel-man' each other. This is where:
>
> 1) I do my best to express your perspective to you.
> 2) You do your best to express my perspective back to me.
>
> This is the way for differences to be understood in a manner that can be
> fruitfully discussed. For what this means, see this video at exactly
> 1:57:15 to 1:58:30. It is an answer to a query from the audience at the end
> of sam harris's first 'book club'.
>
> https://www.youtube.com/watch?v=H_5N0N-61Tg
>
> I think it would be very instructive. Would like like to try?
>

I think that's a great idea Colin.

I think I could do that for most everyone regularly corresponding here:

Colin Hales: Cognition is analogue, not digital. The answer is in the
physical electromagenetic field effects between elements in the brain.
Steve Richfield: The answer is in the detail of neuron behaviour.
Peter Voss: "Integrated" symbolism. Symbolism is OK. The answer is we need
to build a representation for the meaning of an entire situation.
Matt Mahoney: Problem solved. Current neural nets work. We just need to
build them bigger.
Ben Goertzel: Graphs will do anything.

Anyone else wants one, let me know. Mostly variations on the "symbolism was
OK, I too am 50% of the way there already", position.

I hope that may be "steel-manned" in the sense of "restate the other
person's position in a way they would accept".

Though perhaps those are not fully "steel-manned". To fully steel-man you
might need to leave out too much middle ground. A full steel-man might look
more like this:

Colin Hales: Obviously there are enormous differences between the brain and
a von Neumann computer. We need to explore this
NN people: Neural nets find meaningful patterns.
Symbolic people: There is a symbolic element to cognition.

But that's no good because everyone agrees and goes home! You need a little
friction to gain traction and make progress. The idea might be somewhere
between straw-man and steel-man.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T87761d322a3126b1-M69c11b96522f9ae7878c725f
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] test

2019-06-30 Thread Rob Freeman
Korrelan,

Good. Interested to talk to you about this. A lot I agree with. But let me
just pick some specific points.

On Sun, Jun 30, 2019 at 5:00 PM korrelan  wrote:

> ...
>
> The external sensory cortex re-encodes incoming sensory streams by
> applying spatiotemporal compression
>

OK. Compression. That corresponds to my "repeated structure" suspicion.

I have something to say about compression. Long story short, I think we
need to think about cognitive patterns as an expansion not a compression.

It invalidates nothing I've seen with your current work. You talk about
patterns. I talk about patterns. I just may perhaps broaden what you think
of as a pattern. Move away from the idea a pattern is always a compression.

But there is nothing like an example. Rather talk in the abstract maybe I
can link a very nice talk, which expresses similar ideas in a different
domain.

Domas is talking about reverse engineering, not cognition. But finding
meaningful structure in computer code is a similar problem to cognition
when you think about it. I very much like what he does with binary for his
reverse engineering task. He doesn't compress. He expands:

Christopher Domas The future of RE Dynamic Binary Visualization
https://www.youtube.com/watch?v=4bM3Gut1hIk

The correspondence may not be obvious. If so there will be little point
arguing about it in the abstract. Hopefully I can interest you in a
concrete example. But I'm just throwing his talk on the off chance it
strikes a chord with you: this idea that cognition may be performing an
expansion of sensory input, by contrast with the idea that the search for
meaning must be a compression of sensory input.

For a concrete example maybe phonemes might be a place to start.

Phoneme model
>
> ...At this stage the phonemes will require a kind of wrapper; a program to
> interpret the connectomes output and then trigger the phonemes.
>

Right. So the problem is to distinguish your "parallel spatiotemporal spike
trains" so that the right phoneme is output at the right time.

In your words how to implement the "wrapper".

I'm not trying to challenge you on this. It just so happens I've been
thinking about ways to implement such a "wrapper" as exactly a "parallel
spatiotemporal spike train". It's a very close fit for what you say.

We can think about the wrapper as a compression of the spatiotemporal spike
train. I'm going to suggest we think about it as an expansion (for phonemes
the expansion is trivial, it just allows our phonemes to shift a bit. The
expansion idea really comes into its own when you move from phonemes to
words, and particularly words to sentences. It's very difficult/so far
proven impossible, to build a "wrapper" for sentences, in the form of a
compression.)

Actually it is better than just a wrapper to interface phonemes (and
finally sentences.) It is actually a wrapper which generates phonemes from
first principles. So it does not require any assumption of phonemes. It
predicts them. No more "botch'.

It just so happens I've been looking for a platform to try my theory
(schema?) for how to generate phonemes, using exactly the kind of parallel
spatiotemporal spike train you are talking about. Your platform seems
ideal. So if you don't want to try my schema, maybe you might be willing to
let me use your platform to try it myself.

-Rob


--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tf97c751029c2e4db-M7af197cc81c296036ddf1528
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The future of AGI

2019-02-25 Thread Rob Freeman
On Tue, Feb 26, 2019 at 7:00 PM Nanograte Knowledge Technologies <
nano...@live.com> wrote:

> ...
>
> If I may suggest, perhaps stepping back to consider if the personal tone
> of your conversation is justified on the hand of the topic and content, or
> perhaps your personal frustration alone.
>

Tone? What's wrong with my tone? I've said nothing personal.

Linas is ignoring my points.

I say he is ignoring my points.

He agrees:

"...ignore it and move on?", "Yep. Life is short."

OK, message received. So long as we have a clear understanding.

I'm just trying to help. And learn a thing or two myself. We all have the
same goal.


> I hope to see more constructive interaction.
>

What's not constructive? It's constructive for me. I've learned a lot about
what OpenCog is up to.

Then as a bonus Linas's "Neural-Net vs. Symbolic Machine Learning" is a
gold mine of formalism. He has nice arguments for linearity of vector
representations. And there he is explicitly mentioning gauge (amusingly not
long after warning me when I say category theory reminds me of a gauge
theory.)

Very nice. Very close to me.

I'll probably use it as a reference for arguments to the necessity of an
unstructured network over vectors, and maybe some other things.

It is very close. It just needs a flip of perspective to make it a
generative framework (c.f. permutations) rather than a learning framework.

This together with Pissanetzky's work which I just came across recently
too, both are encouraging. We may jump out of the deep "learning" hole
soon. It won't require much change of insight to do so. The formal
machinery is mostly already there. Linas has done a beautiful job of
mathematical exposition. We just need to jump from "learning" to generative.

Someone is going to click soon. Many people, probably.

Hardware, however, we probably have to wait for.

I'm eyeing Intels new Loihi spiking chip. Brainchip (though it is not
spiking.) But some exploratory architectures are appearing. The signs in
the hardware area are positive too. Some innovative parallel architectures
starting to emerge.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M431ffa7a85cdacfa51884260
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The future of AGI

2019-02-25 Thread Rob Freeman
On Tue, Feb 26, 2019 at 5:19 PM Linas Vepstas 
wrote:

>
> On Mon, Feb 25, 2019 at 9:04 PM Rob Freeman 
> wrote:
>
>> ...
>> You mean you have no knowledge of attempts at distributional learning of
>> grammar from the '90s?
>>
>
> Sure. Yes, I suppose I mean that. Is there something you can recommend?
> Should I just google  David Powers and Hinrich Schuetze and start reading
> randomly, or is there something particularly juicey that I should look at?
>

I can't imagine you haven't googled already, so I can only guess you intend
to make some distinction.

Hinrich Schuetze was the first lead I found when I started searching in
1994: "Dimensions of Meaning", "Distributional Part-of-Speech Tagging".
Steven Finch was another. He googles up in a proceedings: "Grammatical
Inference: Learning Syntax from Sentences: Third International Colloquium,
ICGI-96..., Volume 3".

There is more about distributional models of meaning. I'm trying to
remember what the big sub-field was called... Lexical Semantic Analysis.
Very mainstream. That carried on to Dominic Widdows and "Geometry and
Meaning" in 2004. Widdows went on to be central in the organization of the
"Quantum Interaction" conference, I believe. That relates to some of the
work by Coecke et al, with quantum field theory formalisms.

But focusing on grammar... Of course I know Powers from a particular
project. I wrote him off in 1998 because he was still trying to learn
categories, and I had decided it was impossible. Googling now, I didn't
realize his work went back quite so far. All came to NIL I guess... This is
of historical interest:

"ITK Research Memo December 1991 SHOE: Tiie Extraction of Hierarchical
Structure for Machine Learning of Natural Language." D. Powers 8i W.
Daelemans.

https://pdfs.semanticscholar.org/596a/713366155d907c8340a44fa0d80489e4491e.pdf


"Powers (1984, 1989) has already shown that word classes and associated
rules may be learned either by statistical clustering techniques or
self-organizing neural models, using untagged data, thus achieving
completely unsupervised learning"

Browsing the bibliography at random, this is quite interesting. Another
chaos and language lead to follow up:

"Nicolis, John S. (1991b) "Chaotic Dynamics of Linguistic Processes at the
syntactical, semantic and Pragmatic Levels: Zipf's law, Pattern Recognition
and Decision Making under Uncertainty and Conflict", Proceedings of
QUALICO`91, University of Trier, September 23-27, 1991.

In the late '90s everything was about finding the right set of "features".
I think it came to a head with ever more complex statistical models, first
Hidden Markov Models, then probabilistic context free grammars.

I wasn't attending closely, because I already believed clear symbolic
categories were impossible.

Of more interest to me there was a separate thread of analogy based
grammar, a form of distributed representation, which took off separately,
because of course no-one was finding nice clean symbolic categories.
Daelemans "Memory-based learning", Skousen "Analogical Model of Language".
Rens Bod took off with another tangent of that, statistical combination of
trees(?) which seemed to get a lot of funding for a while.

Oh, Menno van Zaanen had something he called "Alignment Based Learning".
Which he took broadly back to Zellig Harris, Chomsky's teacher. I had quite
a lot to do with him.

You can see the symbolic learning work terminating with Bengio, retreating
from learned symbolic categories to vector representations around 2003 with
his Neural Language Model, which has been the most successful.

It's fun to reminisce.

There's a bunch of other threads to it too.

But you must be making some narrow distinction, which makes your attempt to
learn rules unique??

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-Ma31c62b3691456d066298ff0
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] The future of AGI

2019-02-25 Thread Rob Freeman
Linas,

The thing to take away from Pissanetzky's formulation is the idea of
finding structure of cognitive relevance by permuting observations.

Permutation, not patterns. An active principle. It's a subtle difference.
You can analyse them with the same maths, but that is to miss the point.

The idea that perception is permuting observed elements to emphasize causal
symmetries is very powerful. Causal symmetries can be independent of a
particular form. The way the parts are put together can be quite different
each time. You can have "one shot learning", even "creativity". Compare
this to the state-of-the-art which is to "learn" patterns. That's more
Platonic. Permutation is a more active process, more Heraclitus. There is a
change in perspective. It does reduce to the same maths as yours Linas.
Maths is maths. Your element, I know. What matters is how you apply and
interpret it. All this stuff may be the same. Except for your assumption:

LV, Feb 9: "If you are picky, you can have several different kinds of
common nouns, but however picky you are, you need fewer classes than there
are words."

You don't address this at all in your reply, except so imply there is
context which justifies it. What is the context which justifies it? Are you
saying there are fewer classes than words, or more? This is a crucial point.

Perhaps even more crucial is the related point about whether these classes,
these factorizations, have gauge?

Note 10, p.g. 52, "Neural-Net vs. Symbolic Machine Learning":

"In physics, such an ambiguity of factorization is known as a global gauge
symmetry; fixing a gauge removes the ambiguity. In natural language, the
ambiguity is spontaneously broken: words have only a few senses, and are
often synonymous, making the left and right factors sparse. For this
reason, the analogy to physics is entertaining but mostly pointless."

You don't address that at all either. Only to say:

'I doubt that the use of the word "gauge" there has anything at all to do
with the word "gauge" of "gauge theory".   In physics, "gauge" has a very
specific and precise meaning; I suspect that there's an abuse of that word,
here.'

"Doubt", "suspect"? You don't know? Which use of "gauge" do you suspect is
an abuse? My use is reference to your use.

Do you suspect your use of "gauge" is an abuse?

Why do you dismiss it, your use, as "mostly pointless"? For a meticulous
guy you are remarkably quiet on this. I'm surprised you would leave
something at doubt and suspicion, or even "mostly", and go no further.

Do you assert that such "ambiguity of factorization" is insignificant in
natural language (even only "mostly", which is still non-zero and might be
significant)? On what evidence? On the evidence your grammar learning teams
find nice clean factors and grammar learning is demonstrably solved? Or on
the evidence everybody ends up 'stumbling at the "clustering" step'.
Certainly everybody was stumbling at this step in the '90s, by the likes of
David Powers and Hinrich Schuetze trying to learn category, and if you dig
a bit deeper, going right back to the destruction of American Structuralist
linguistics by Chomsky in the '50s.

You keep ignoring the evidence Chomsky put his finger on 60 years ago that
even distributional analyses of the phoneme '...were inconsistent or
incoherent in some cases and led to (or at least allowed) absurd analyses
in others.' Your only comment was:

"Wow. I did not know that. Interesting, I suppose. Its, well, beats me.
Science is littered with misunderstandings by brilliant people. Time
passes. Debates are forgotten.  I don't know what to do with this."

You "don't know what to do with this" so you ignore it and move on?

You see gauge in natural language, but you find reason to assume, to
"doubt", to "suspect" it is "mostly" not important, and move on?

These are points you keep ignoring as you remain in your comfort zone
citing maths.

How are you applying the maths?

1) Are there more classes (permutations?) than examples?

2) Is observed gauge in language (your observation!) "pointless", "mostly
pointless" or not pointless at all, simply not explored further?

3) How do you explain Chomsky's observation 60 years ago that
distributional analysis was '...inconsistent or incoherent in some cases
and led to (or at least allowed) absurd analyses in others.'

-Rob

On Mon, Feb 25, 2019 at 6:30 PM Linas Vepstas 
wrote:

>
>
> On Sun, Feb 24, 2019 at 6:34 PM Rob Freeman 
> wrote:
>
>>
>> Where I see the gap might be summarized with one statement by Linas:
>>
>> LV, Feb 9: "If you are picky, you can have several different kinds of
>> common nouns, but however picky you are, you need fewer classe

Re: [agi] The future of AGI

2019-02-24 Thread Rob Freeman
Ben and List,

I wanted to leave this. I'm glad I didn't.

I hadn't previously been paying attention and missed this thread. It's
actually very good. Thanks to Dorian Aur for dragging it back up.

I agree with almost all of it. As with all I've been commenting on here.
There is just one small gap I would like to close, one small loose thread I
would like to pull. But I think it is an important one.

Where I see the gap might be summarized with one statement by Linas:

LV, Feb 9: "If you are picky, you can have several different kinds of
common nouns, but however picky you are, you need fewer classes than there
are words."

"...however picky you are, you need fewer classes than there are words..."

"Fewer classes"?

How do we know that is true?

The number of ways a set of objects can be formed into classes is far
greater than the number of objects.

I think an analysis of why this is so -- why we can meaningfully create
more classes than we have examples, and why we need to -- will be, if not
AGI, then the basis of the leap from narrow AI to AGI.

On one level it may reduce to the assertion that gauge, as observed by
Linas in his symbolism and connectionism paper, is not trivial for language:

Note 10, p.g. 52, "Neural-Net vs. Symbolic Machine Learning":

"In physics, such an ambiguity of factorization is known as a global gauge
symmetry; fixing a gauge removes the ambiguity. In natural language, the
ambiguity is spontaneously broken: words have only a few senses, and are
often synonymous, making the left and right factors sparse. For this
reason, the analogy to physics is entertaining but mostly pointless."

It's not pointless Linas.

Experimentally, that language "gauge" is not pointless, was observed in the
results for the distributional learning of phonemes I cited, dating back 60
years, to the time Chomsky used it to destroy distributional learning as
the central paradigm for linguistics.

If Linas takes his formalism and adds just this one insight, I think we
have the basis to move beyond the wall which we been left at by symbolic
methods for 60 years.

In concrete terms it means we don't need to "learn" 10K to 1M rules. We
need to be able to "permute" the elements of (causally related) data many
more ways that that.

Sergio Pissanetzky comes close to this, with his analysis in terms of
permutations, also resulting in a network:

"Structural Emergence in Partially Ordered Sets is the Key to Intelligence"
http://sergio.pissanetzky.com/Publications/AGI2011.pdf

Though Pissanetzky may not see the significance of gauge either. We need a
mechanism to cluster these networks on the fly. Nothing exists until it is
observed.

Going on. Fascinating insights into the state of OpenCog:

LV: 'as of about 8-9 months ago, the language-learning project has been
handed over to a team. They are still getting up to speed; they don't have
any background in linguistics or machine learning, and so have been trying
to learn both at the same time, and getting tangled up and stalled, as a
result. Currently they are stumbling at the "clustering" step'

Sure they will!

Of course if you try and "learn" something which is subjective you'll run
your processor hot chasing your tail, but just end up breaking the wrong
symmetry for an appreciable fraction of your problems. That's been the
history of distributional language learning since the '90s. I interviewed
for a PhD on this with David Powers and Chris Manning in 1998.

The only solution is root and branch change away from "learning" and
towards resolving the "gauge" by finding contextually relevant low energy
permutations or factorizations of sets at run time.

Looking up what you are doing with your "MST" parser. You start by
clustering links, starting with mutual information.

One way to see the change which needs to take place might be to move away
from thinking about the problem as the character of links, and focus on the
problem as permutations of links. I think Pissanetzky's causal sets can be
thought of as links. The thing is he focuses on permuting them.

Think about it. Another way to find a "minimum spanning tree" is to set
your network oscillating. And that can be done in real time (for parallel
hardware).

Look at this paper:

A Network of Integrate and Fire Neurons for Community Detection in Complex
Networks
Marcos G. Quiles, Liang Zhao, Fabricio A. Breve, Roseli A. F. Romero
http://www.sbmac.org.br/dincon/trabalhos/PDF/invited/69194.pdf

You have all the pieces Linas. Put them together.

1) We don't know there are fewer meaningful classes of words than words, in
theory the number of combinations, sets, of words is far greater than the
number of words.
2) Even in theory, your own theory, by the properties of the formalism,
language networks also have gauge.
3) In practice, as insisted on by Chomsky 60 years ago, they are observed
to have gauge.
4) Only a context of use will break the symmetry and determine the correct
factorization. We need to factorize, permute, 

Re: [agi] openAI's AI advances and PR stunt...

2019-02-22 Thread Rob Freeman
On Sat, Feb 23, 2019 at 11:48 AM Linas Vepstas 
wrote:

>
>
> On Fri, Feb 22, 2019 at 4:34 PM Rob Freeman 
> wrote:
>
>>
>> Can you summarize it in a line?
>>
>
> There's a graph. Here's where it is and what it looks like. Here's how
> neural nets factor it. Here are other ways of factoring it.
>

Here? Where?

Somewhere in your "Neural-Net vs. Symbolic Machine Learning" pdf?

This is explaining why distributed representation works better than
symbolism? Why neural networks have dominated in the last 6-8 years and
improved speech recognition by 20%+ etc?

This might be a candidate:

p.g. 34 "The key driver behind the deep-learning models is the replacement
of intractable probabilistic models by those that are computationally
efficient."

You might be saying distributed representation has dominated these last few
years because it is more computationally efficient.

Is it Fig. 5, p.g. 51?

I don't understand this sentence on that figure:

"Due to the fact that words are combinations of word-senses,
hard-clustering in this fashion is undesirable; by contrast, it seems that
wordsenses could be validly hard-clustered."

Doesn't that contradict?

Oh, I like this, Note 10, p.g. 52.:

"In physics, such an ambiguity of factorization is known as a global gauge
symmetry; fixing a gauge removes the ambiguity. In natural language, the
ambiguity is spontaneously broken: words have only a few senses, and are
often sysnonymous, making the left and right factors sparse. For this
reason, the analogy to physics is entertaining but mostly pointless." (BTW
typo there for you Linas: "sysnonymous" :-)

Gauge symmetry. What I mentioned earlier I see in category theory. But you
are dismissing it. This might be a good point of contrast. I'm making it
significant, and even central to my analysis of our problems.

I don't see anything hard and fast, as an argument why distributed
representation works better. Unless it is because it can represent such
gauge symmetry...

-R

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M7528245deeb1f0e4dc9cfed2
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-22 Thread Rob Freeman
Sorry. That was an addendum.

On Sat, Feb 23, 2019 at 11:21 AM Linas Vepstas 
wrote:

> ...
>
>> Meanwhile linguistics is still split, structuralism is still destroyed.
>> No-one knows why distributed representation works better, and equally
>> no-one knows why we can't "learn" adequate representations.
>>
>
> Well, I tried to explain exactly why in the PDF that I posted. That was
> kind-of the whole point of the PDF. I don't know why you say "nobody",
> because what I'm saying may be technical, but its not that technical, and
> it should not be that hard to follow.
>
>
>>
>> So we flounder around, struggling with stuff which either "works" better
>> or doesn't "work" better, for what reason, nobody knows quite why.
>>
>
> I think I know and I think  I explained and here I get frustrated because
> I've been saying this for over five years now, and despite writing hundreds
> of pages on the topic, its a thud, with no echo. Is anyone out there?  Can
> anyone hear me?
>

I must have missed it. I got excited when I saw category theory, which
hinted at my solution. I must have missed your solution completely.

Can you summarize it in a line?

-R

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mb2040b7000c9846d3a00df66
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-22 Thread Rob Freeman
LV> '...what's the diff?  Yes, I'm using the "observed words", just like
everyone else. And doing something with them, just like everyone else.'

Yup.

Except Chomsky won't use observed words. The entire field of Generative
Grammar that he created won't use observed words. Chomsky realized you
can't learn a consistent representation from observed words. Not a
consistent representation. He held linguistics to the fire over it and
created a stink. In the end he chose the wrong solution. He chose to
abandon observation to retain consistency. He should have abandoned
consistency, linearity, and kept observation. As Lamb argued. But
non-linear systems were to so well known in his day. At least Chomsky saw
the problem clearly.

But as you say:

"Time passes. Debates are forgotten."

Unfortunately reality stays the same. Those who forget history doomed to
live in a loop, forever repeating the mistakes of the past, etc.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-Ma596e154ab9bc5ef455e3771
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-22 Thread Rob Freeman
Linas,

So you do not believe finite sets can contain contradictory meanings?
Because the proof that sets can have contradictory meanings uses infinities?

Despite the evidence of contradictory meanings in real sets, from real
language, going back 60 years.

Meanwhile linguistics is still split, structuralism is still destroyed.
No-one knows why distributed representation works better, and equally
no-one knows why we can't "learn" adequate representations.

So we flounder around, struggling with stuff which either "works" better or
doesn't "work" better, for what reason, nobody knows quite why.

Your solution is a return to symbolism?

OK. I got excited for a minute when I saw your reference to category
theory. But actually it turns out you don't see any special significance to
it. It is just your path back to symbolism.

So I'm back with my original assessment of OpenCog: it is stuck back in a
prior, symbolic, conception of computational linguistics.

Just as deep nets are stuck trying to "learn" representations. And a tweak
to add ad-hoc layers, to put a "spotlight" on fragments of context and
"learn" fragments of observed agreement and dependency, seems like an
"important step".

Whew, that's a relief. No advance in AI in the last 60 years after all.
Except distributed representation this last few years. But nobody knows
why. Back in our comfort zone.

Anyway, to round off, I don't think the OpenAI stuff is an important
advance. Incrementally they may be including fragments of context, but
everything has to be learned, from bigger and bigger data sets. There's
still no principle of combination of elements to create new meaning.

OpenCog may be capable of everything in principle. A network is a fully
flexible data structure. Ben saw that similarity with what I propose. But
you are not doing anything new in practice. Far from seeing the need for
it, you are moving away from distributed representation again. Stuck
thinking in terms of symbolic era theory.

But using the state-of-the-art from deep nets in practice.

LV> '...what's the diff?  Yes, I'm using the "observed words", just like
everyone else. And doing something with them, just like everyone else.'

Yup.

-Rob

On Sat, Feb 23, 2019 at 9:12 AM Linas Vepstas 
wrote:

> Oh foo. If you stopped engaging me in conversation, I could get some real
> work done that I need to do. However, lacking in willpower, I respond:
>
> On Fri, Feb 22, 2019 at 1:18 AM Rob Freeman 
> wrote:
>
>>
>>
>> So this is just a property of sets.
>>
>
> This is a property of infinite sets.  Finite sets don't have such
> problems.  Much or most of math is about dealing the infinite.  Examples:
>
> * You cannot count to infinity. But you can just say "the set of all
> natural numbers" and claim it exists (as an axiom).
>
> * Every real number has an infinite number of digits. You cannot write
> them down, but you can give some of them a name - "pi", "sqrt 2" so that
> others can know what you are talking about.
>
> * The complex exponential function exp(z) is "entire" on the complex plane
> z: it has no poles. ... except at infinity, where it has an "essential
> singularity": its totally tangled up there, in such a way that you cannot
> compactify or close or complete. The value of exp(z) as z \to \infty
> depends on the direction you go in.
>
> * Limits. Function spaces are tame when they have limits e.g. Banach
> spaces.  The tame ones are work-horses for practical applications. The
> whack ones are weird, and are objects of current study.
>
> * Complicated examples, e.g. Haupfvermutang about triangulation as an
> approximation.
>
> All I'm saying is that similar tensions about completeness/incompleteness
> when something goes to infinity happens in logic as well. One simple
> example, maybe:
>
> * A normal, finite state machine, as commonly understood, works on finite
> sets.  However, there is a way to define them so that they also work on
> infinite, smooth spaces: euclidean spaces R^n, probability spaces
> (simplexes), on "homogeneous spaces" (spheres, quotients of continuous
> groups, etc.)  These have the name of "geometric finite state machines".
> When the homogeneous space is U(n), then ts called a "quantum finite
> automata" (as in "quantum computing").
>
> * These "geometric finite automata" (GFA) are a lot like.. the ordinary
> ones, but they have subtle differences, involving the languages they can
> recognize...
>
> * Turing machines are a kind of "extension" of finite state machines. I
> have never seen any exposition showing a formulation of Turing machines
> acting on  homogeneous spaces. I assume th

Re: [agi] Some thoughts about Symbols and Symbol Nets

2019-02-21 Thread Rob Freeman
Jim,

I haven't been following this thread closely. But if you look at what we've
been talking about in the OpenAI PR stunt thread, at one level I think it
comes to much what you are talking about.

My old vector parser demo linked in that thread does something like this.
You can see it happen. The meaning of a word is selected by its context.

More interesting, and more relevant to the rest of that thread, is if you
then substitute bits of a sentence into each other on the same basis. A new
combination of words causes a new exact set of substitutions, with a new
exact set of contexts, and thus new meaning.

But you don't want to do it with vectors. My mistake back in the day.
Vectors throw away some of the context information. You want to leave the
words together with their context in a network. Finding shared contexts,
for meaning or substitution, will then correspond to a kind of clique in
the network. You can imagine a... sheaf(?) of links fanning out from shared
contexts to the words that share them.

If meaning were context free it would all just reduce to a grammar and
GOFAI would work.

So, viewed through this lens, the problem historically has been that we've
assumed the system is context free, linear, when it is not.

-Rob

On Fri, Feb 22, 2019 at 12:19 PM Jim Bromer  wrote:

> One more thing. I would not try to limit the meaning of a symbol within a
> context. I would like to be able to find the best meaning for the symbol or
> the best referential utilization for that symbol during interpretation or
> understanding, but this is not the same as trying to limit the meaning of
> symbols before hand. Well, we would like to use previous learning to
> interpret a symbol sub-net (such as a string), but even here it is not true
> that I want to limit the meaning of the symbol. For example, we want to be
> able to interpret new applications of a symbol (like a word) in sentences
> which we have never seen before. Someone might say that this definition of
> how I would like to use symbols is only semantically different than what
> nano was saying, that he actually meant the same kind of thing that I am
> talking about. But I don't agree. The subtlety comes from the many
> variations of possible meanings that we intend with our words, but that
> does not necessarily indicate that we were saying the same thing.
> Jim Bromer
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcc0e554e7141c02f-M4cee8199c3edc864691b8026
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-20 Thread Rob Freeman
OK, that makes sense Ben. So long as you have a clear picture of how to
progress the theory beyond temporary expediency, temporarily using the
state-of-the-art may be strategic.

So long as you are moving forward with some strong theoretical candidates
too. If we get trapped without theory, we're blind. There are too few
people with any broad theoretical vision for how to move forward. Too many
script kiddies just tweaking blindly, viz, the "important step" this thread
began with.

I'm encouraged that it now appears you are deconstructing grammar and
resolving it to a raw network level. That Linas is seeing the relevance of
maths like category theory, which is motivated by formal incompleteness,
speaks to this realization. (Though he may not be aware of the full import.)

Deep learning does not realize this. It does not realize that formal
description above the network level will be incomplete. I'm sure that is
the key theoretical failure holding it back. I wish there were more people
talking about it. If deep learning realized this they wouldn't still be
trying to "learn" representations, whether in intermediate layers or other.
(What was that article recently about the representation "bottle neck" idea
in deep learning needing to be revised?)

It's actually ironic that deep learning does not realize this idea that
formal description (above the network) must always be incomplete, because
it is also the key to the success of deep learning! The whole success of
distributed representation is due to this. The field moved to distributed
representation blindly, without theory, just because things started working
better that way! But you still see articles where people say no-one knows
why distributed representation works better! The failure of theoretical
vision is extraordinary.

But if you've deconstructed your dictionaries (throwing out your hand coded
dictionaries?) and arrived back at the level of observation in a sequence
network. And done it because of the theoretical realization that complete
representation above the network level is impossible (or was it just an
accident, trying to deconstruct symbolism to connectionism, and then
accidentally noticing the relevance to variational theories of maths?) Then
your group would be the only ones I've come across who have done (I think
the Oxford thread of variational formalization, around Coecke et al.
Grefenstette, were also seduced away by the short term effectiveness of
deep learning on GPUs.)

We need to keep (or get!) the theoretical vision.

Even given a vision of formal incompleteness, you (and Pissanetzky?) may
still be lacking a totally clear conception that the key problem is
assembling elements in new ways all the time.

Still, some focus on assembling elements in different ways (from a sequence
network) is encouraging. There is scope to move forward.

As a concrete, immediate, idea to explore moving forward, I hope you'll
look at the idea of using oscillations to structure your sequence network
representations. For it to be meaningful your networks will need to be
connected in ways which directly reflect the ideas behind embedding vectors
(without their linearities.) I don't know if that is true for your
networks. But given that, implementation should be simple, if practically
slow without parallel hardware.

-Rob

On Thu, Feb 21, 2019 at 12:03 AM Ben Goertzel  wrote:

> It's not that it's hard to feed data into OpenCog, whose
> representation capability is very flexible
>
> It's simply that deep NNs running on multi-GPU clusters can process
> massive amounts of text very very fast, and OpenCog's processing is
> much slower than that currently...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M8a9e4f757c63064e69ab356b
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-19 Thread Rob Freeman
No problem Linas.

>From my point of view I'm encouraged that OpenCog is closer to an
interesting language model than I thought.

I was surprised to see you discussing category theory in the context of a
language model. Category theory is motivated by formal incompleteness. To
see this applied to language is something I argued for long and hard. I
remember a thread on U. Bergen's "Corpora" list in 2007 with very little
traction on exactly this point. People could not see the relevance of
formal incompleteness for language. To see you, and others, embracing this
is progress.

I'm glad you are deconstructing the grammar. You are probably forced to it
by the success of distributed representation these last few years. But at
least you are doing it. I feared some ghastly fixed Link Grammar with
neural nets just disambiguating.

Instead I see Ben is right. My basic data formulation of the problem may
well be compatible with what OpenCog are doing. That's good.

Though I am still confused by Ben's statement that "we can't currently feed
as much data into our OpenCog self-adapting graph as we can into a BERT
type model".

What does an OpenCog network look like that it is hard to feed data into
it. Can you give an example?

What does an OpenCog network with newly input raw language data look like?

-Rob

On Wed, Feb 20, 2019 at 4:21 PM Linas Vepstas 
wrote:

>
>
> On Tue, Feb 19, 2019 at 5:33 PM Rob Freeman 
> wrote:
>
>> Linas,
>>
>> OK. I'll take that to be saying, "No, I was not influenced by Coecke et
>> al.
>>
> Note to self: do not write long emails. (I was hoping it would serve some
> educational purpose)
>
> I knew the basics of cat theory before I knew any linguistics. I skimmed
> the Coecke papers, I did not see anything surprising/unusual that made me
> want to study them closely. Perhaps there are some golden nuggets in those
> papers? What might they be?
>
>  So, no, I was not influenced by it.
>
> For all that, I can't figure out if you are contrasting yourself with
>> their treatment or if you like their treatment.
>>
>
> I don't know what thier treatment is. After a skim, It seemed like
> word2vec with some minor twist. Maybe I missed something.
>
>>
>> I quite liked their work when I came across it. In fact I had been
>> thinking for some time that category theory has something the flavour of a
>> gauge theory.
>>
>
> Yellow flag. Caution. I wouldn't go around saying things like that, if I
> were you. The problem is that I've got a PhD in theoretical particle
> physics and these kinds of remarks don't hold water.
>
> I have no problem with the substance of it. I just don't think it is
>> necessary. At least for the perceptual problem. The network is a perfectly
>> good representation for itself.
>>
>
> To paraphrase: "I know that the earth goes around the sun. I don't think
> it's necessary to understand Kepler's law".  For most people, that's a
> perfectly fine statement.  Just don't mention black holes in the same
> breath.
>
> > I say you can't resolve above the network. Simple enough for you?
>
> Too simple. No clue what that sentence means.
>
> > '"fixed"? What is being "lost"?  What are you "learning"? What do you
> mean by "training"? What do you mean by "representation"? What do you mean
> by "contradiction"?'...
> >  But if you haven't understood them, it will probably be easier to use
> your words than argue about them endlessly.
>
> ???
>
> > Anyway, in substance, you just don't understand what I am proposing. Is
> that right?
>
> I don't recall seeing a proposal. Perhaps I hopped in at the wrong end of
> an earlier conversation.
>
> I'm sorry, this conversation went upside down really fast. I've hit dead
> end.
>
> --linas
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M931e84b52e289fb3c776903b
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-19 Thread Rob Freeman
Linas,

OK. I'll take that to be saying, "No, I was not influenced by Coecke et al."

For all that, I can't figure out if you are contrasting yourself with their
treatment or if you like their treatment.

I quite liked their work when I came across it. In fact I had been thinking
for some time that category theory has something the flavour of a gauge
theory. So this was by way of a confirmation for me too. They also wrote
some about the applicability of math from quantum field theory to the
problem.

I have no problem with the substance of it. I just don't think it is
necessary. At least for the perceptual problem. The network is a perfectly
good representation for itself.

Other than that. Fine. Jigsaw pieces. OK. You're looking for a better
jigsaw piece. We can see it like that.

I say you can't resolve above the network. Simple enough for you?

I'd like to answer your questions about the meaning of words:

'"fixed"? What is being "lost"?  What are you "learning"? What do you mean
by "training"? What do you mean by "representation"? What do you mean by
"contradiction"?'...

But if you haven't understood them, it will probably be easier to use your
words than argue about them endlessly.

Anyway, in substance, you just don't understand what I am proposing. Is
that right?

-Rob

On Wed, Feb 20, 2019 at 8:52 AM Linas Vepstas 
wrote:

> Hi Rob,
>
> On Tue, Feb 19, 2019 at 3:23 AM Rob Freeman 
> wrote:
>
>>
>> An aside. You mention sheaf theory as a way to get around the linearity
>> of vector spaces. Is this influenced in any way by what Coecke, Sadrzadeh,
>> and Clark proposed for compositional distributional models in the '00s?
>>
>
> No. But... How can I explain it quickly?  Bob Coecke & I both have PhD's
> in theoretical physics. We both know category theory. The difference is
> that he has published papers and edited books on the topic; whereas I have
> only read books and papers.
>
> In a nutshell:  category theory arose when mathematicians observed
> recurring patterns (of how symbols are arranged on a page) in different,
> unrelated branches of math.  For this discussion, there are two pattens
> that are relevant. One is "tensoring" (or multiplying, or simply "writing
> one symbol next to another symbol, in such a way that they are understood
> to go together with each-other") The other is "contracting" (or applying,
> or inserting into, or plugging in, or reducing, for example, plugging in
> "x" into "f(x)" to get "y", which can be written with an arrow: "x" next to
> "f(x)" --> y )
>
> These two operations "go together" and "work with one-another" in a very
> large number of settings, ranging from linear algebra to Hilbert spaces
> (quantum mechanics) to lambda calculus to the theory of computation. And
> also natural language.
>
> The wikipedia article about currying gives a flavor of the broadness of
> the concept. https://en.wikipedia.org/wiki/Currying  It is well-worth
> reading because it is both a simple concept, almost "trivial", and at the
> same time "deep and insightful".  In that article, the "times" symbol is
> the "tensor or multiplication", and the arrow is the applying/plugging-in.
>
> Next, one thinks like so: "great, I've got two operations, 'tensor' and
> 'arrow'. What is the set of all possible legal ways in which these two can
> be combined into an expression?" That is, "what are the legal expressions?"
>
> So, whenever one asks this kind of question: "I have some symbols, what
> are the legal ways of arranging them on a page?" the answer is "you have a
> 'language' and that 'language' has a 'syntax' (i.e. rules for legal
> arrangements).  Well, it turns out that the 'language' of 'tensor' and
> 'arrow' is exactly (simply-typed) lambda calculus. Wow. Because, of course,
> everyone knows that lambda calculus has something to do with computation -
> something important, even.
>
> When you get done studying and pondering everything I wrote above, you
> eventually come to realize that the legal arrangements of 'tensor' and
> 'arrow' look like graphs with lines connecting things together. There are
> some rules: you can only connect a plug into a socket of the correct shape.
> You can only plug one plug into only one socket, never many-to-one. In
> general, plugging to the left is different than plugging to the right.
> When you force left and right to be symmetric, you get tensor algebras,
> Hilbert spaces, and quantum mechanics. When you don't force that symmetry,
> you get natural language.
>
> In pictures, from Bob Coecke:
> http:

Re: [agi] openAI's AI advances and PR stunt...

2019-02-19 Thread Rob Freeman
Ben,

On Wed, Feb 20, 2019 at 2:39 AM Ben Goertzel  wrote:

> ...
> The unfortunate fact is we can't currently feed as much data into our
> OpenCog self-adapting graph as we can into a BERT type model, given
> available resources... thus using the latter to help tweak weights in
> the former may have significant tactical advantage...
>

You can't feed as much data into your graph as you can into a BERT type
model??

How are you feeding data into your graph? Shouldn't this just be
observation?

Isn't -> it -> as -> simple -> as -> this -> ?

This stuff is very close. First there is Linas's observation that vector
representations too have linearities and thus are inadequate. This would
map to the insight I noted before re. my vector model, that starting with
vectors I had already thrown a lot of contextual variation away, that I
needed a graph representation.

And I like what seems to be a link to the need for any formalization to be
category theoretic, or at least some kind of gauge/invariant theory, even
QM maths at one point, a la Coecke et al.

But I still fear there is some idea of learning which is trapping you.

Looking at what Linas has written in the other thread:

LV: "Use symbolic systems, but use fractional values, not 0/1 relations.
Find a good way of updating the weights. So, deep-learning is a very
effective weight-update algorithm. But there are other ways of updating
weights too (that are probably just as good or better.  Next, clarify the
vector-space-vs-graph-algebra issue, and then you can clearly articulate
how to update weights on symbolic systems, as well."

For sure there are other ways of updating the weights which are just as
good or better! How much better for the weights to be virtual,
corresponding to clusters of observed links. The "update" mechanism can
just be a clustering.

Deep learning is not a great update mechanism. Firstly because it does not
have the formal power of the graph you are trying to inform. Right? We just
agreed vectors have linearities didn't we?? (LV: "Although vector spaces
are linear, semantics isn't; not really.") So their power will never be
enough. Using DL you are crippling the power of the full graph you have
just decided you need. And there are other things too. Deep nets need
linearities in the way their weights can be updated so information can
propagate down through the layers. And they impose a structure on your
graph quite apart from linearities in the connectivity and layers. All
those carefully crafted "attention" layers etc. are a hack on a full
connectivity. To use them is to throw away so much of the power of a full
graph. And for what? So you have a weight update mechanism? A weight update
mechanism which makes assumptions you want to throw away. And they don't
even "update weights" to find new structure all the time, in real time
(which is really what the distinction between symbolism and distributed
representation should be, I worry that may be becoming lost -- even my
vector model had that, by substituting vectors into other vectors. That was
the point of it, to portray the cognition problem as one of
creating/generating patterns, not learning them. That gets lost if you tie
everything to a deep learning weight estimation.)

Rather the weight update mechanism should just be a clustering. Probably
just oscillations on the network.

As regards formalization. We can sweat blood to formalize groupings.
Visualize patterns of connectivity as symbols. As Coecke etc theorized,
that formalism will probably need to be category theoretic, or using
quantum mechanical maths in another thread of literature. But the
vector-space-vs-graph-algebra issue and the complex maths goes away if you
are not worried about formalization, but only want a functioning system.
It's backwards to insist on formalization, so you can formulate the problem
in terms of updating weights on symbols, when the network is already a
perfectly good representation in itself. The groupings are easier to
generate than to describe. (You can formalize them, but you will just get a
formalism which is indeterminate, like QM. You will need the network to
resolve the indeterminacy anyway -- embodiment.)

Perhaps we will need to crystallize out a formalization to move to
reasoning systems. But for raw perception, the network will be enough. And
raw perception is the big failure at the moment, self-driving cars etc.

LV: "the path forward is at the intersection of the two: a net of symbols,
a net with weights, a net with gradient-descent properties, a net with
probabilities and probability update formulas."

It can be seen this way. But both the symbols and the weights should be
virtual, corresponding to clusters, with the clusters projected out at real
time. Deep nets are a really bad way to do this.

I must be missing something in the data format of your network. I don't see
an argument why it can't be as simple as:

1) Establish a network of sequential observations. (Super easy for 

Re: [agi] openAI's AI advances and PR stunt...

2019-02-19 Thread Rob Freeman
Linas,

Ooh, Nice.

This is different to what I saw in the links Ben posted. If you are really
deconstructing your grammar like this then Ben could be right, it might be
a good fit with me. Everything can reduce to graphs. If you are visiting
there from Link Grammar rather than embedding vectors which was my path,
that does not matter. So long as you travel fully along to the destination
of a raw network we can get the same power.

An aside. You mention sheaf theory as a way to get around the linearity of
vector spaces. Is this influenced in any way by what Coecke, Sadrzadeh, and
Clark proposed for compositional distributional models in the '00s?

E.g.
Category-Theoretic Quantitative Compositional Distributional Models of
Natural Language Semantics
Edward Grefenstette
https://arxiv.org/abs/1311.1539

I see you cite Coecke in your 2017 "Sheaves: A Topological Approach to Big
Data" paper.

Personally I followed their work when I came across it in the '00s. It was
the first other work in a compositional distributional vein I had come
across so I was delighted to find it. There was precious little about
distributed models in the '00s, let alone compositional distributional. But
I decided that the formalisms of both category theory as a response to the
subjectivity of maths, and QM as a model for the subjectivity of physics,
may well apply, but that in practice it will be easier to build structures
which manifest these properties, rather than to formally describe them.

Anyway, perhaps Ben is right, you may be doing the first two steps of my
suggested solution: 1) coding only a sequence net of observed sequences,
and 2) projecting out latent "invariants" by clustering according to shared
contexts.

But then if you are doing all this, why are you using BERT type training
"to guide the numerical weightings of symbolic language-patterns"? That
will still trap you in the limitations of learned representations. The
whole point of a network is that, like a distributed representation, it can
handle multiplicity of interpretation. Once you fix it by "learning" you
have lost this. Perhaps the high current state of development of these
learning algorithms may help in the short term, but it seems like a misstep.

The solution I came is to forget all thought of training or "learning"
representations. Not least because you get contradictions.

And I believe the best way to do that will be to set the network
oscillating and varying inhibition, to get the resolution of groupings we
want dynamically.

-Rob

On Tue, Feb 19, 2019 at 6:45 PM Linas Vepstas 
wrote:

> Hi Rob,
>
> On Mon, Feb 18, 2019 at 4:40 PM Rob Freeman 
> wrote:
>
>> Ben,
>>
>> That's what I thought. You're still working with Link Grammar.
>>
>> But since last year working on informing your links with stats from
>> deep-NN type, learned, embedding vector based predictive models? You're
>> trying to span the weakness of each formalism with the strengths of the
>> other??
>>
>
> Yes but no. I've been trying to explain what exactly is good, and what,
> exactly is bad with NN vector-space models. There is a long tract written
> on this here.
> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/skippy.pdf
>
>
>
>>
>> There's a lot to say about all of that.
>>
>> Your grammar will be learned, with only the resolution you bake in from
>> the beginning.
>>
> No.
>
>
>> Your embedding vectors will be learned,
>>
>
> The point of the long PDF is to explain why NN-vectors are bad. It
> attempts to first explain *why* neural nets work for language, and why
> vectors are *almost* the right thing, and then it tries to explain why NN
> vectors don't actually do everything you actually want.  I've noticed that,
> in the middle of all these explanations, I lose my audience; haven't
> figured out how to keep them, yet.
>
>
>> and the dependency decisions they can inform on learned, and thus finite,
>> too. Plus you need to keep two formalisms and marry them together... Large
>> teams for all of that...
>>
>
> No. I've already got 75% of it coded up. It actually works, I've got long
> diary entries and notes with detailed stats on it all.  Unfortunately, I
> have not been able to carve out the time to finish the work, its been
> stalled since the fall of last year.
>
> It would be wonderful if I could get someone else interested in this work.
>
> --linas
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mf8d91ef7fb9013cf13f130c7
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-18 Thread Rob Freeman
Ben,

That's what I thought. You're still working with Link Grammar.

But since last year working on informing your links with stats from deep-NN
type, learned, embedding vector based predictive models? You're trying to
span the weakness of each formalism with the strengths of the other??

There's a lot to say about all of that.

Your grammar will be learned, with only the resolution you bake in from the
beginning. Your embedding vectors will be learned, and the dependency
decisions they can inform on learned, and thus finite, too. Plus you need
to keep two formalisms and marry them together... Large teams for all of
that...

On the plus side large teams have already been working on those formalisms
for decades. An asymptote plots the final failure of learning based
methods, but after decades their development is already far down whatever
asymptote, so you start right at the top of the tallest tree. No-one will
complain about your performance, because it is all anyone achieves.

So, state-of-the-art. But complex, and doomed to asymptotic failure as ever
more comprehensive learning, ever more definitively fails, to capture every
Zipf long tail.

Me, it's simple.

You make the embedding vectors generative by substituting them into each
other. Infinite patterns. But the patterns are all meaningful, because the
substitution is meaningful (it's the very basis of all embedding vectors.)
You get hierarchy, with all that implies about dependency, grammar, for
free, as a natural consequence of the substitution process (it's
non-associative.)

And I now think the "substituting them into each other" step may be as
simple as setting a network of observed sequences oscillating.

As you say 2013 is a long time ago. When I was pitching embedded vector
models in 2013 (let alone 2000), they were not the mainstream. Now they are.

If you ask me whether I feel vindicated, the answer is yes.

But vindication is hollow. We still don't have what I was also pitching
back then: vector recombination to generate new, meaningful, patterns,
rather than learn patterns.

No large teams working on this yet, so it is still crude. In particular it
probably requires parallel hardware.

Anyway, if you don't want to try this pattern creation idea for language, I
suggest you look at what Pissanetzky has done. That is more readily
interpretable in terms of vision. For vision the generative aspect is not
so obvious. I'm not sure Pissanetzky realizes his permutation "invariants"
will need to be constantly generated too. But by using permutation as his
base, the machinery is all there. Permutation is a generative process.

-Rob

On Mon, Feb 18, 2019 at 9:26 PM Ben Goertzel  wrote:

> 2013 seems an insanely long time ago ;) ...  we started with these ideas
>
> https://arxiv.org/abs/1401.3372
>
> https://arxiv.org/abs/1703.04368
>
> but have gone some way since... last summer's partial update was
>
> https://www.youtube.com/watch?v=ABvopAfc3jY
>
>
> http://agi-conf.org/2018/wp-content/uploads/2018/08/UnsupervisedLanguageLearningAGI2018.pdf
>
> But since last summer we have onboarded a new team that does deep-NN
> language modeling and we are experimenting with using the output of
> deep-NN predictive models to guide syntactic parsing and semantic
> interpretation in OpenCog...
>
> -- Ben

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mb51fe3bc6cac5a7076ab3244
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] openAI's AI advances and PR stunt...

2019-02-17 Thread Rob Freeman
On Mon, Feb 18, 2019 at 4:01 PM Ben Goertzel  wrote:

> ***
> ...
> And likely the way to do this is to set the network oscillating, and
> vary inhibition to get the resolution of "invariants" you want.
> ***
>
> But we are not doing that.  Interesting...


Cool. Maybe there could be a match. I want hardware to try this. I've been
playing with the open SpiNNaker interface with the European Human Brain
Project. But it's batch jobs, and lots of fiddling with Python interfaces
for neurosimulators. I'd love to collaborate on it.

I've always thought the set based representations Open Cog was using could
be fitted to what I was doing.

But I didn't realize you were doing anything like clustering raw sequence
networks on the fly for language. As I recall when I talked to Ruiting in
2013 she was using rules??

I only started looking in detail at networks after 2013. I was trying to
fit it to Jeff Hawkins' sequence networks. I finally figured out the whole
thing, even the "cross-product" in my original formulation, would reduce to
something as simple as diamond shaped "cliques" in the network. But I
didn't know how to isolate them. Then I came across this paper:

A Network of Integrate and Fire Neurons for Community Detection in Complex
Networks
Marcos G. Quiles, Liang Zhao, Fabricio A. Breve, Roseli A. F. Romero
http://www.sbmac.org.br/dincon/trabalhos/PDF/invited/69194.pdf

I immediately googled to see if there was any evidence this mapped to
perception experimentally. And found "binding by synchrony" has been
observed since the '80s, but no-one knew why!

Nice for me, because I'm working from the other direction. I have the
network, and the network predicts binding by synchrony.

If you want to compare notes and see if there are any cross insights which
could inform, we can talk over email if you like.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M11ab0b41875929fa3b7d0bee
Delivery options: https://agi.topicbox.com/groups/agi/subscription


  1   2   >