Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-29 Thread Matt Mahoney
Natural language is ambiguous at every level including tokens. Is "someone"
one word or two? Language models handle this by mixing the predictions
given by the contexts "some", "one", and "someone".

Using fixed dictionaries is a compromise that reduces accuracy for reducing
computation,  like all tradeoffs in data compressors.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M44b0fc5b236911fe9a971c6d
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-29 Thread Rob Freeman
On Wed, May 29, 2024 at 9:37 AM Matt Mahoney  wrote:
>
> On Tue, May 28, 2024 at 7:46 AM Rob Freeman  
> wrote:
>
> > Now, let's try to get some more detail. How do compressors handle the
> > case where you get {A,C} on the basis of AB, CB, but you don't get,
> > say AX, CX? Which is to say, the rules contradict.
>
> Compressors handle contradictory predictions by averaging them

That's what I thought.

> > "Halle (1959, 1962) and especially Chomsky (1964) subjected
> > Bloomfieldian phonemics to a devastating critique."
> >
> > Generative Phonology
> > Michael Kenstowicz
> > http://lingphil.mit.edu/papers/kenstowicz/generative_phonology.pdf
> >
> > But really it's totally ignored. Machine learning does not address
> > this to my knowledge. I'd welcome references to anyone talking about
> > its relevance for machine learning.
>
> Phonology is mostly irrelevant to text prediction.

The point was it invalidated the method of learning linguistic
structure by distributional analysis at any level. If your rules for
phonemes contradict, what doesn't contradict?

Which is a pity. Because we still don't have a clue what governs
language structure. The best we've been able to come up with is crude
hacks like dragging a chunk of important context behind like a ball
and chain in LSTM, or multiplexing pre-guessed "tokens" together in a
big matrix, with "self-attention".

Anyway, your disinterest doesn't invalidate my claim that this result,
pointing to contradiction produced by distributional analysis learning
procedures for natural language, is totally ignored by current machine
learning, which implicitly or otherwise uses those distributional
analysis learning procedures.

> Language evolved to be learnable on neural networks faster than our
> brains evolved to learn language. So understanding our algorithm is
> important.
>
> Hutter prize entrants have to prebuild a lot of the model because
> computation is severely constrained (50 hours in a single thread with
> 10 GB memory). That includes a prebuilt dictionary. The human brain
> takes 20 years to learn language on a 10 petaflop, 1 petabyte neural
> network. So we are asking quite a bit.

Neural networks may have finally gained close to human performance at
prediction. A problem where you can cover a multitude of sins with raw
memory. Something at which computers trivially exceed humans by as
many orders of magnitude as you can stack server farms. You can just
remember each contradiction including the context which selects it. No
superior algorithm required, and certainly none in evidence. (Chinese
makes similar trade-offs, swapping internal mnemonic sound structure
within tokens, with prodigious memory requirements for the tokens
themselves.) Comparing 10 GB with 1 petabyte seems ingenuous. I
strongly doubt any human can recall as much as 10GB of text. (All of
Wikipedia currently ~22GB compressed, without media? Even to read it
all is estimated at 47 years, including 8hrs sleep a night
https://www.reddit.com/r/theydidthemath/comments/80fi3w/self_how_long_would_it_take_to_read_all_of/.
So forget 20 years to learn it, it would take 20 years to read all the
memory you give Prize entrants.) But I would argue our prediction
algorithms totally fail to do any sort of job with language structure.
Whereas you say babies start to structure language before they can
walk? (Walking being something else computers still have problems
with.) And far from stopping at word segmentation, babies go on to
build quite complex structures, including new ones all the time.

Current models do nothing with structure, not at human "data years"
8-10 months, not 77 years (680k hours of audio to train "Whisper" ~77
years? 
https://www.thealgorithmicbridge.com/p/8-features-make-openais-whisper-the.
Perhaps some phoneme structure might help there...) The only structure
is "tokens". I don't even think current algorithms do max entropy to
find words. They just start out with "tokens". Guessed at
pre-training. Here's Karpathy and LeCun talking about it:

Yann LeCun
@ylecun·Feb 21
Text tokenization is almost as much of an abomination for text as it
is for images. Not mentioning video.
...
Replying to @karpathy
We will see that a lot of weird behaviors and problems of LLMs
actually trace back to tokenization. We'll go through a number of
these issues, discuss why tokenization is at fault, and why someone
out there ideally finds a way to delete this stage entirely.

https://x.com/ylecun/status/1760315812345176343

By the way, talking about words. That's another thing which seems to
have contradictory structure in humans, e.g. native Chinese speakers
agree what constitutes a "word" less than 70% of the time:

"Sproat et. al. (1996) give empirical results showing that native
speakers of Chinese frequently agree on the correct segmentation in
less than 70% of the cases."
https://s3.amazonaws.com/tm-town-nlp-resources/ch2.pdf

I guess that will be:

Sproat, Richard W., Chilin Shih, William Gale, and 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-28 Thread Matt Mahoney
On Tue, May 28, 2024 at 7:46 AM Rob Freeman  wrote:

> Now, let's try to get some more detail. How do compressors handle the
> case where you get {A,C} on the basis of AB, CB, but you don't get,
> say AX, CX? Which is to say, the rules contradict.

Compressors handle contradictory predictions by averaging them,
weighted both by the implied confidence of predictions near 0 or 1,
and the model's historical success rate. Although transformer based
LLMs predict a vector of word probabilities, PAQ based compressors
like CMIX predict one bit at a time, which is equivalent but has a
simpler implementation. You could have hundreds of context models
based on the last n bytes or word (the lexical model), short term
memory or sparse models (semantics), and learned word categories
(grammar). The context includes the already predicted bits of the
current word, like when you guess the next word one letter at a time.

The context model predictions are mixed using a simple neural network
with no hidden weights:

p =  squash(w stretch(x))

where x is the vector of input predictions in (0,1), w is the weight
vector, stretch(x) = ln(x/(1-x)), squash is the inverse = 1/(1 +
e^-x), and p is the final bit prediction. The effect of stretch() and
squash() is to favor predictions near 0 or 1. For example, if one
model guesses 0.5 and another is 0.99, the average would be about 0.9.
The weights are then adjusted to favor whichever models were closest:

w := w + L stretch(x) (y - p)

where y is the actual bit (0 or 1), y - p is the prediction error, and
L is the learning rate, typically around 0.001.

> "Halle (1959, 1962) and especially Chomsky (1964) subjected
> Bloomfieldian phonemics to a devastating critique."
>
> Generative Phonology
> Michael Kenstowicz
> http://lingphil.mit.edu/papers/kenstowicz/generative_phonology.pdf
>
> But really it's totally ignored. Machine learning does not address
> this to my knowledge. I'd welcome references to anyone talking about
> its relevance for machine learning.

Phonology is mostly irrelevant to text prediction. But an important
lesson is how infants learn to segment continuous speech around 8-10
months, before they learn their first word around 12 months. This is
important for learning languages without spaces like Chinese (a word
is 1 to 4 symbols, each representing a syllable). The solution is
simple. Word boundaries occur when the next symbol is less
predictable, reading either forward or backwards. I did this research
in 2000. https://cs.fit.edu/~mmahoney/dissertation/lex1.html

Language evolved to be learnable on neural networks faster than our
brains evolved to learn language. So understanding our algorithm is
important.

Hutter prize entrants have to prebuild a lot of the model because
computation is severely constrained (50 hours in a single thread with
10 GB memory). That includes a prebuilt dictionary. The human brain
takes 20 years to learn language on a 10 petaflop, 1 petabyte neural
network. So we are asking quite a bit.

-- 
-- Matt Mahoney, mattmahone...@gmail.com

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M1f60044363c6d90c81505bcc
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-28 Thread Rob Freeman
Matt,

Nice break down. You've actually worked with language models, which
makes it easier to bring it back to concrete examples.

On Tue, May 28, 2024 at 2:36 AM Matt Mahoney  wrote:
>
> ...For grammar, AB predicts AB (n-grams),

Yes, this looks like what we call "words". Repeated structure. No
novelty. And nothing internal we can equate to "meaning" either. Only
meaning by association.

> and AB, CB, CD, predicts AD (learning the rule
> {A,C}{B,D}).

This is the interesting one. It actually kind of creates new meaning.
You can think of "meaning" as a way of grouping things which makes
good predictions. And, indeed, those gap filler sets {A,C} do pull
together sets of words that we intuitively associate with similar
meaning. These are also the sets that the HNet paper identifies as
having "meaning" independent of any fixed pattern. A pattern can be
new, and so long as it makes similar predictions {B,D}, for any set
{B,D...}, {X,Y...}..., we can think of it as having "meaning", based
on the fact that arranging the world that way, makes those shared
predictions. (Even moving beyond language, you can say the atoms of a
ball, share the meaning of a "ball", based on the fact they fly
through the air together, and bounce off walls together. It's a way of
defining what it "means" to be a "ball".)

Now, let's try to get some more detail. How do compressors handle the
case where you get {A,C} on the basis of AB, CB, but you don't get,
say AX, CX? Which is to say, the rules contradict. Sometimes A and C
are the same, but not other times. You want to trigger the "rule" so
you can capture the symmetries. But you can't make a fixed "rule",
saying {A,C}, because the symmetries only apply to particular sub-sets
of contexts.

You get a lot of this in natural language. There are many such shared
context symmetries in language, but they contradict. Or they're
"entangled". You get one by ordering contexts one way, and another by
ordering contexts another way, but you can't get both at once, because
you can't order contexts both ways at once.

I later learned these contradictions were observed even at the level
of phonemes, and this was crucial to Chomsky's argument that grammar
could not be "learned", back in the '50s. That this essentially broke
consensus in the field of linguistics. Which remains in squabbling
sub-fields over this result, to this day. That's why theoretical
linguistics contributes essentially nothing to contemporary machine
learning. Has anyone ever wondered? Why don't linguists tell us how to
build language models? Even the Chomsky hierarchy cited by James'
DeepMind paper from the "learning" point of view is essentially a
misapprehension of what Chomsky concluded (that observable grammar
contradicts, so formal grammar can't be learned.)

A reference available on the Web I've been able to find is this one:

"Halle (1959, 1962) and especially Chomsky (1964) subjected
Bloomfieldian phonemics to a devastating critique."

Generative Phonology
Michael Kenstowicz
http://lingphil.mit.edu/papers/kenstowicz/generative_phonology.pdf

But really it's totally ignored. Machine learning does not address
this to my knowledge. I'd welcome references to anyone talking about
its relevance for machine learning.

I'm sure all the compression algorithms submitted to the Hutter Prize
ignore this. Maybe I'm wrong. Have any addressed it? They probably
just regress to some optimal compromise, and don't think about it too
much.

If we choose not to ignore this, what do we do? Well, we might try to
"learn" all these contradictions, indexed on context. I think this is
what LLMs do. By accident. That was the big jump, right, "attention",
to index context. Then they just enumerate vast numbers of (an
essentially infinite number of?) predictive patterns in one enormous
training time.That's why they get so large.

No-one knows, or wonders, why neural nets work for this, and symbols
don't, viz. the topic post of this thread. But this will be the
reason.

In practice LLMs learn predictive patterns, and index them on context
using "attention", and it turns out there are a lot of those different
predictive "embeddings", indexed on context. There is no theory.
Everything is a surprise. But if you go back in the literature, there
are these results about contradictions to suggest why it might be so.
And the conclusion is still either Chomsky's one, that language can't
be learned, consistent rules exist, but must be innate. Or, what
Chomsky didn't consider, that complexity of novel patterns defying
abstraction, might be part of the solution. It was before the
discovery of chaos when Chomsky was looking at this, so perhaps it's
not fair to blame him for not considering it.

But then it becomes a complexity issue. Just how many unique orderings
of contexts with useful predictive symmetries are there? Are you ever
at an end of finding different orderings of contexts, which specify
some useful new predictive symmetry or other? The example of

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-27 Thread James Bowery
On Mon, May 27, 2024 at 9:34 AM Rob Freeman 
wrote:

> James,
>
> I think you're saying:
>
> 1) Grammatical abstractions may not be real, but they can still be
> useful abstractions to parameterize "learning".
>

And more generally, people pragmatically adopt different fictional abstract
grammars as convenient:  Fictions except perhaps the lowest power ones like
regular/finite-state automata which discrete and finite dynamical systems,
*real*izable in physical phenomena.



> 2) Even if after that there are "rules of thumb" which actually govern
> everything.
>

Well, there are abstract grammars (see #1) and there are instances of those
grammars (#2), which may be induced given the
fictional-abstract-grammar-of-convenience.  BUT we would prefer not to have
to _learn_ these maximum-parsimony induced grammars if it can be avoided.
Some fragments of physics may be learnable from the raw data, but we may
not wish to do so depending on the task at hand -- hence "physics informed
machine learning" makes sense in many cases.

Well, you might say why not just learn the "rules of thumb".

Hopefully you meant "one" rather than "you" since in both #1 and #2 I've
repeatedly asserted that these are rules of thumb/heuristics/ML biases
(whether biasing toward a particular abstract grammar (#1) or toward a
particular grammar (#2)) we should be taking more seriously.


> But the best counter against the usefulness of the Chomsky hierarchy
> for parameterizing machine learning, might be that Chomsky himself
> dismissed the idea it might be learned.


A machine learning algorithm is like the phenotype of the genetic code
Chomsky has hypothesized for innate capacity to learn grammar.  It
"parameterizes" the ML algorithm prior to any learning taking place by that
ML algorithm.  Both #1 and #2 are examples of such "genetic code" hardwired
into the ML algorithm -- to "bias" learning in such a manner as to speed up
convergence on lossless compression of the training data (ie: all prior
observations in the ideal case of the Algorithmic Information Criterion for
model selection since "test" and "validation" data are available in
subsequent observations and you can't do any better than the best lossless
compression as the basis for decision).


> And his most damaging
> argument? That learned categories contradict. "Objects" behave
> differently in one context, from how they behave in another context.


I guess the best thing to do at this stage is stop talking about "Chomsky"
and even about "grammar" and instead talk only about which level of "
automata " one wishes to
focus on in the "domain specific computer programming language" one wishes
to use to achieve the shortest algorithm that outputs all prior
observations.  Some domain specific languages are not Turing complete but
may nevertheless be interpreted by a meta-interpreter that is.

This gets away from anything specific to "Chomsky" and lets us focus on the
more well-grounded notion of Algorithmic Information.


> I see it a bit like our friend the Road Runner. You can figure out a
> physics for him. But sometimes that just goes haywire and contradicts
> itself - bodies make holes in rocks, fly high in the sky, or stretch
> wide.
>
> All the juice is in these weird "rules of thumb".
>
> Chomsky too failed to find consistent objects. He was supposed to push
> past the highly successful learning of phoneme "objects", and find
> "objects" for syntax. And he failed. And the most important reason
> I've found, was that even for phonemes, learned category contradicted.
>
> That hierarchy stuff, that wasn't supposed to appear in the data. That
> could only be in our heads. Innate. Why? Well for one thing, because
> the data contradicted. The "learning procedures" of the time generated
> contradictory objects. This is a forgotten result. Machine learning is
> still ignoring this old result from the '50s. (Fair to say the
> DeepMind paper ignores it?) Chomsky insisted these contradictions
> meant the "objects" must be innate. The idea cognitive objects might
> be new all the time (and particularly the idea they might contradict!)
> is completely orthogonal to his hierarchy (well, it might be
> compatible with context sensitivity, if you accept that the real juice
> is in the mechanism to implement the context sensitivity?)
>
> If categories contradict, that is represented on the Chomsky hierarchy
> how? I don't know. How would you represent contradictory categories on
> the Chomsky hierarchy? A form of context sensitivity?
>
> Actually, I think, probably, using entangled objects like quantum. Or
> relation and variance based objects as in category theory.
>
> I believe Coecke's team has been working on "learning" exactly this:
>
> From Conceptual Spaces to Quantum Concepts: Formalising and Learning
> Structured Conceptual Models
> Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljiˇc and Stephen Clark
> Quantinuum
> 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-27 Thread Matt Mahoney
The top text compressors use simple models of semantics and grammar
that group words into categories as fuzzy equivalence relations. For
semantics, the rules are reflexive, A predicts A (but not too close.
Probability peaks 50-100 bytes away), symmetric, A..B predicts A..B
and B..A, and transitive, A..B, B..C predicts A..C. For grammar, AB
predicts AB (n-grams), and AB, CB, CD, predicts AD (learning the rule
{A,C}{B,D}). Even the simplest compressors like zip model n-grams. The
top compressors learn groupings. For example, "white house", "white
car", "red house" predicts the novel "red car". For cmix variants, the
dictionary would be "white red...house car" and take whole groups as
contexts. The dictionary can be built automatically by clustering in
context space.

Compressors model semantics using sparse contexts. To get the reverse
prediction "A..BB..A" and transitive prediction
"A..BB..C...A..C you use a short term memory like LSTM both for
learning associations and as context for prediction.

Humans use lexical, semantic, and grammar induction to predict text.
For example, how do you predict, "The flob ate the glork. What do
flobs eat?"

Your semanic model learned to associate "flob" with "glork", "eat"
with "glork" and "eat" with "ate". Your grammar model learned that
"the" is usually followed by a noun and that nouns are sometimes
followed by the plural "s". Your lexical model tells you that there is
no space before the "s". Thus, you and a good language model predict
the novel word "glorks".

All of this has a straightforward implementation with neural networks.
It takes a lot of computation because you need on the order as many
parameters as you have bits of training data, around 10^9 for human
level. Current LLMs are far beyond that with 10^13 bits or so. The
basic operations are prediction, y = Wx, and training, W += xy^t,
where x is the input word vector, y is the output word probability
vector, W is the weight matrix, and ^t means transpose. Both
operations require similar computation (the number of parameters,
|W|), but training requires more hardware because you are compressing
a million years worth of text in a few days. Prediction for chatbots
only has to be real time, about 10 bits per second.

And as I have been saying since 2006, text prediction (measured by
compression) is all you need to pass the Turing test, and therefore
all you need to appear conscious or sentient.

-- 
-- Matt Mahoney, mattmahone...@gmail.com

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M0a4075c52c080ace6a702efa
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-27 Thread Rob Freeman
James,

I think you're saying:

1) Grammatical abstractions may not be real, but they can still be
useful abstractions to parameterize "learning".

2) Even if after that there are "rules of thumb" which actually govern
everything.

Well, you might say why not just learn the "rules of thumb".

But the best counter against the usefulness of the Chomsky hierarchy
for parameterizing machine learning, might be that Chomsky himself
dismissed the idea it might be learned. And his most damaging
argument? That learned categories contradict. "Objects" behave
differently in one context, from how they behave in another context.

I see it a bit like our friend the Road Runner. You can figure out a
physics for him. But sometimes that just goes haywire and contradicts
itself - bodies make holes in rocks, fly high in the sky, or stretch
wide.

All the juice is in these weird "rules of thumb".

Chomsky too failed to find consistent objects. He was supposed to push
past the highly successful learning of phoneme "objects", and find
"objects" for syntax. And he failed. And the most important reason
I've found, was that even for phonemes, learned category contradicted.

That hierarchy stuff, that wasn't supposed to appear in the data. That
could only be in our heads. Innate. Why? Well for one thing, because
the data contradicted. The "learning procedures" of the time generated
contradictory objects. This is a forgotten result. Machine learning is
still ignoring this old result from the '50s. (Fair to say the
DeepMind paper ignores it?) Chomsky insisted these contradictions
meant the "objects" must be innate. The idea cognitive objects might
be new all the time (and particularly the idea they might contradict!)
is completely orthogonal to his hierarchy (well, it might be
compatible with context sensitivity, if you accept that the real juice
is in the mechanism to implement the context sensitivity?)

If categories contradict, that is represented on the Chomsky hierarchy
how? I don't know. How would you represent contradictory categories on
the Chomsky hierarchy? A form of context sensitivity?

Actually, I think, probably, using entangled objects like quantum. Or
relation and variance based objects as in category theory.

I believe Coecke's team has been working on "learning" exactly this:

>From Conceptual Spaces to Quantum Concepts: Formalising and Learning
Structured Conceptual Models
Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljiˇc and Stephen Clark
Quantinuum
https://browse.arxiv.org/pdf/2401.08585

I'm not sure. I think the symbolica.ai people may be working on
something similar: find some level of abstraction which applies even
across varying objects (contradictions?)

For myself, in contrast to Bob Coecke, and the category theory folks,
I think it's pointless, and maybe unduly limiting, to learn this
indeterminate object formalism from data, and then collapse it into
one or other contradictory observable form, each time you observe it.
(Or seek some way you can reason with it even in indeterminate object
formulation, as with the category theory folks?) I think you might as
well collapse observable objects directly from the data.

I believe this collapse "rule of thumb", is the whole game, one shot,
no real "learning" involved.

All the Chomsky hierarchy limitations identified in the DeepMind paper
would disappear too. They are all limitations of not identifying
objects. Context coding hacks like LSTM, or "attention", introduced in
lieu of actual objects, and grammars over those objects, stemming from
the fact grammars of contradictory objects are not "learnable."

On Sun, May 26, 2024 at 11:24 PM James Bowery  wrote:
>
> It's also worth reiterating a point I made before about the confusion between 
> abstract grammar as a prior (heuristic) for grammar induction and the 
> incorporation of so-induced grammars as priors, such as in "physics informed 
> machine learning".
>
> In the case of physics informed machine learning, the language of physics is 
> incorporated into the learning algorithm.  This helps the machine learning 
> algorithm learn things about the physical world without having to re-derive 
> the body of physics knowledge.
>
> Don't confuse the two levels here:
>
> 1) My suspicion that natural language learning may benefit from prioritizing 
> HOPDA as an abstract grammar to learn something about natural languages -- 
> such as their grammars.
>
> 2) My suspicion (supported by "X informed machine learning" exemplified by 
> the aforelinked work) that there may be prior knowledge about natural 
> language more specific than the level of abstract grammar -- such as specific 
> rules of thumb for, say, the English language that may greatly speed training 
> time on English corpora.
>
> On Sun, May 26, 2024 at 9:40 AM James Bowery  wrote:
>>
>> See the recent DeepMind paper "Neural Networks and the Chomsky Hierarchy" 
>> for the sense of "grammar" I'm using when talking about the HNet paper's 
>> connection to 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-26 Thread James Bowery
It's also worth reiterating a point I made before

about the confusion between abstract grammar as a prior (heuristic) for
grammar induction and the incorporation of so-induced grammars as priors,
such as in "physics informed machine learning
".

In the case of physics informed machine learning, the language of physics
is incorporated into the learning algorithm.  This helps the machine
learning algorithm learn things about the physical world without having to
re-derive the body of physics knowledge.

Don't confuse the two levels here:

1) My suspicion that natural language learning may benefit from
prioritizing HOPDA as an *abstract* grammar to learn something about
natural languages -- such as their grammars.

2) My suspicion (supported by "X informed machine learning" exemplified by
the aforelinked work) that there may be prior knowledge about natural
language more specific than the level of *abstract* grammar -- such as
specific rules of thumb for, say, the English language that may greatly
speed training time on English corpora.

On Sun, May 26, 2024 at 9:40 AM James Bowery  wrote:

> See the recent DeepMind paper "Neural Networks and the Chomsky Hierarchy
> " for the sense of "grammar" I'm using
> when talking about the HNet paper's connection to Granger's prior papers
> about "grammar", the most recent being "Toward the quantification of
> cognition ".  Although the DeepMind
> paper doesn't refer to Granger's work on HOPDAs, it does at least
> illustrate a fact, long-recognized in the theory of computation:
>
> Grammar, Computation
> Regular, Finite-state automaton
> Context-free, Non-deterministic pushdown automaton
> Context sensitive, Linear-bounded non-deterministic Turing machine
> Recursively enumerable, Turing machine
>
> Moreover, the DeepMind paper's empirical results support the corresponding
> hierarchy of computational power.
>
> Having said that, it is critical to recognize that everything in a finite
> universe reduces to finite-state automata in hardware -- it is only in our
> descriptive languages that the hierarchy exists.  We don't describe all
> computer programs in terms of finite-state automata aka regular grammar
> languages.  We don't describe all computer programs even in terms of Turing
> complete automata aka recursively enumerable grammar languages.
>
> And I *have* stated before (which I first linked to the HNet paper)
> HOPDAs are interesting as a heuristic because they *may* point the way to
> a prioritization if not restriction on the program search space that
> evolution has found useful in creating world models during an individual
> organism's lifetime.
>
> The choice of language, hence the level of grammar, depends on its utility
> in terms of the Algorithmic Information Criterion for model selection.
>
> I suppose one could assert that none of that matters so long as there is
> any portion of the "instruction set" that requires the Turing complete
> fiction, but that's a rather ham-handed critique of my nuanced point.
>
>
>
> On Sat, May 25, 2024 at 9:37 PM Rob Freeman 
> wrote:
>
>> Thanks Matt.
>>
>> The funny thing is though, as I recall, finding semantic primitives
>> was the stated goal of Marcus Hutter when he instigated his prize.
>>
>> That's fine. A negative experimental result is still a result.
>>
>> I really want to emphasize that this is a solution, not a problem, though.
>>
>> As the HNet paper argued, using relational categories, like language
>> embeddings, decouples category from pattern. It means we can have
>> categories, grammar "objects" even, it is just that they may
>> constantly be new. And being constantly new, they can't be finitely
>> "learned".
>>
>> LLMs may have been failing to reveal structure, because there is too
>> much of it, an infinity, and it's all tangled up together.
>>
>> We might pick it apart, and have language models which expose rational
>> structure, the Holy Grail of a neuro-symbolic reconciliation, if we
>> just embrace the constant novelty, and seek it as some kind of
>> instantaneous energy collapse in the relational structure of the data.
>> Either using a formal "Hamiltonian", or, as I suggest, finding
>> prediction symmetries in a network of language sequences, by
>> synchronizing oscillations or spikes.
>>
>> On Sat, May 25, 2024 at 11:33 PM Matt Mahoney 
>> wrote:
>> >
>> > I agree. The top ranked text compressors don't model grammar at all.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M440c27b119465aeba2e4d2bb
Delivery options: 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-26 Thread James Bowery
See the recent DeepMind paper "Neural Networks and the Chomsky Hierarchy
" for the sense of "grammar" I'm using
when talking about the HNet paper's connection to Granger's prior papers
about "grammar", the most recent being "Toward the quantification of
cognition ".  Although the DeepMind paper
doesn't refer to Granger's work on HOPDAs, it does at least illustrate a
fact, long-recognized in the theory of computation:

Grammar, Computation
Regular, Finite-state automaton
Context-free, Non-deterministic pushdown automaton
Context sensitive, Linear-bounded non-deterministic Turing machine
Recursively enumerable, Turing machine

Moreover, the DeepMind paper's empirical results support the corresponding
hierarchy of computational power.

Having said that, it is critical to recognize that everything in a finite
universe reduces to finite-state automata in hardware -- it is only in our
descriptive languages that the hierarchy exists.  We don't describe all
computer programs in terms of finite-state automata aka regular grammar
languages.  We don't describe all computer programs even in terms of Turing
complete automata aka recursively enumerable grammar languages.

And I *have* stated before (which I first linked to the HNet paper) HOPDAs
are interesting as a heuristic because they *may* point the way to a
prioritization if not restriction on the program search space that
evolution has found useful in creating world models during an individual
organism's lifetime.

The choice of language, hence the level of grammar, depends on its utility
in terms of the Algorithmic Information Criterion for model selection.

I suppose one could assert that none of that matters so long as there is
any portion of the "instruction set" that requires the Turing complete
fiction, but that's a rather ham-handed critique of my nuanced point.



On Sat, May 25, 2024 at 9:37 PM Rob Freeman 
wrote:

> Thanks Matt.
>
> The funny thing is though, as I recall, finding semantic primitives
> was the stated goal of Marcus Hutter when he instigated his prize.
>
> That's fine. A negative experimental result is still a result.
>
> I really want to emphasize that this is a solution, not a problem, though.
>
> As the HNet paper argued, using relational categories, like language
> embeddings, decouples category from pattern. It means we can have
> categories, grammar "objects" even, it is just that they may
> constantly be new. And being constantly new, they can't be finitely
> "learned".
>
> LLMs may have been failing to reveal structure, because there is too
> much of it, an infinity, and it's all tangled up together.
>
> We might pick it apart, and have language models which expose rational
> structure, the Holy Grail of a neuro-symbolic reconciliation, if we
> just embrace the constant novelty, and seek it as some kind of
> instantaneous energy collapse in the relational structure of the data.
> Either using a formal "Hamiltonian", or, as I suggest, finding
> prediction symmetries in a network of language sequences, by
> synchronizing oscillations or spikes.
>
> On Sat, May 25, 2024 at 11:33 PM Matt Mahoney 
> wrote:
> >
> > I agree. The top ranked text compressors don't model grammar at all.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mcca9a6d522c416b1c95cd3d1
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-25 Thread Rob Freeman
Thanks Matt.

The funny thing is though, as I recall, finding semantic primitives
was the stated goal of Marcus Hutter when he instigated his prize.

That's fine. A negative experimental result is still a result.

I really want to emphasize that this is a solution, not a problem, though.

As the HNet paper argued, using relational categories, like language
embeddings, decouples category from pattern. It means we can have
categories, grammar "objects" even, it is just that they may
constantly be new. And being constantly new, they can't be finitely
"learned".

LLMs may have been failing to reveal structure, because there is too
much of it, an infinity, and it's all tangled up together.

We might pick it apart, and have language models which expose rational
structure, the Holy Grail of a neuro-symbolic reconciliation, if we
just embrace the constant novelty, and seek it as some kind of
instantaneous energy collapse in the relational structure of the data.
Either using a formal "Hamiltonian", or, as I suggest, finding
prediction symmetries in a network of language sequences, by
synchronizing oscillations or spikes.

On Sat, May 25, 2024 at 11:33 PM Matt Mahoney  wrote:
>
> I agree. The top ranked text compressors don't model grammar at all.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Meac024d4e635bb1d9e8f34e9
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-25 Thread Matt Mahoney
I agree. The top ranked text compressors don't model grammar at all.

On Fri, May 24, 2024, 11:47 PM Rob Freeman 
wrote:

> Ah, I see. Yes, I saw that reference. But I interpreted it only to
> mean the general forms of a grammar. Do you think he means the
> mechanism must actually be a grammar?
>
> In the earlier papers I interpret him to be saying, if language is a
> grammar, what kind of a grammar must it be? And, yes, it seemed he was
> toying with actual physical mechanisms relating to levels of brain
> structure. Thalamo-cortical loops?
>
> The problem with that is, language doesn't actually seem to be any
> kind of grammar at all.
>
> It's like saying if the brain had to be an internal combustion engine,
> it might be a Mazda rotary. BFD. It's not an engine at all.
>
> I don't know if the authors realized that. But surely that's the point
> of the HNet paper. That something can generate the general forms of a
> grammar, without actually being a grammar.
>
> I guess this goes back to your assertion in our prior thread that
> "learning" needs to be constrained by "physical priors" of some kind
> (was it?) Are there physical "objects" constraining the "learning", or
> does the "learning" vaguely resolve as physical objects, but not
> quite?
>
> I don't think vague resemblance to objects means the objects must exist,
> at all.
>
> Take Kepler and the planets. If the orbits of planets are epicycles,
> which epicycles would they be? The trouble is, it turns out they are
> not epicycles.
>
> And at least epicycles work! That's the thing for natural language.
> Formal grammar doesn't even work. None of them. Nested stacks, context
> free, Chomsky hierarchy up, down, and sideways. They don't work. So
> figuring out which formal grammar is best, is a pointless exercise.
> None of them work.
>
> Yes, broadly human language seems to resolve itself into forms which
> resemble formal grammar (it's probably designed to do that, so that it
> can usefully represent the world.) And it might be generally useful to
> decide which formal grammar it best (vaguely) resembles.
>
> But in detail it turns out human language does not obey the rules of
> any formal grammar at all.
>
> It seems to be a bit like the way the output of a TV screen looks like
> objects moving around in space. Yes, it looks like objects moving in
> space. You might even generate a physics based on the objects which
> appear to be there. It might work quite well until you came to Road
> Runner cartoons. That doesn't mean the output of a TV screen is
> actually objects moving around in space. If you insist on implementing
> a TV screen as objects moving around in space, well, it might be a
> puppet show similar enough to amuse the kids. But you won't make a TV
> screen. You will always fail. And fail in ways very reminiscent of the
> way formal grammars almost succeed... but fail, to represent human
> language.
>
> Same thing with a movie. Also looks a lot like objects moving around
> on a screen. But is it objects moving on a screen? Different again.
>
> Superficial forms do not always equate to mechanisms.
>
> That's what's good about the HNet paper for me. It discusses how those
> general forms might emerge from something else.
>
> The history of AI in general, and natural language processing in
> particular, has been a search for those elusive "grammars" we see
> chasing around on the TV screens of our minds. And they all failed.
> What has succeeded has been breaking the world into bits (pixels?) and
> allowing them to come together in different ways. Then the game became
> how to bring them together. Supervised "learning" spoon fed the
> "objects" and bound the pixels together explicitly. Unsupervised
> learning tried to resolve "objects" as some kind of similarity between
> pixels. AI got a bump when, by surprise, letting the "objects" go
> entirely turned out to generate text that was more natural than ever!
> Who'd a thunk it? Letting "objects" go entirely works best! If it
> hadn't been for the particular circumstances of language, pushing you
> to a "prediction" conception of the problem, how long would it have
> taken us to stumble on that? The downside to that was, letting
> "objects" go entirely also doesn't totally fit with what we
> experience. We do experience the world as "objects". And without those
> "objects" at all, LLMs are kind of unhinged babblers.
>
> So where's the right balance? Is the solution as LeCun, and perhaps
> you, suggest (or Ben, looking for "semantic primitives" two years
> ago...), to forget about the success LLMs had by letting go of objects
> entirely. To repeat our earlier failures and seek the "objects"
> elsewhere. Some other data. Physics? I see the objects, dammit! Look!
> There's a coyote, and there's a road runner, and... Oh, my physics
> didn't allow for that...
>
> Or could it be the right balance is, yes, to ignore the exact
> structure of the objects as LLMs have done, but no, not to do it as
> LLMs 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-24 Thread Rob Freeman
Ah, I see. Yes, I saw that reference. But I interpreted it only to
mean the general forms of a grammar. Do you think he means the
mechanism must actually be a grammar?

In the earlier papers I interpret him to be saying, if language is a
grammar, what kind of a grammar must it be? And, yes, it seemed he was
toying with actual physical mechanisms relating to levels of brain
structure. Thalamo-cortical loops?

The problem with that is, language doesn't actually seem to be any
kind of grammar at all.

It's like saying if the brain had to be an internal combustion engine,
it might be a Mazda rotary. BFD. It's not an engine at all.

I don't know if the authors realized that. But surely that's the point
of the HNet paper. That something can generate the general forms of a
grammar, without actually being a grammar.

I guess this goes back to your assertion in our prior thread that
"learning" needs to be constrained by "physical priors" of some kind
(was it?) Are there physical "objects" constraining the "learning", or
does the "learning" vaguely resolve as physical objects, but not
quite?

I don't think vague resemblance to objects means the objects must exist, at all.

Take Kepler and the planets. If the orbits of planets are epicycles,
which epicycles would they be? The trouble is, it turns out they are
not epicycles.

And at least epicycles work! That's the thing for natural language.
Formal grammar doesn't even work. None of them. Nested stacks, context
free, Chomsky hierarchy up, down, and sideways. They don't work. So
figuring out which formal grammar is best, is a pointless exercise.
None of them work.

Yes, broadly human language seems to resolve itself into forms which
resemble formal grammar (it's probably designed to do that, so that it
can usefully represent the world.) And it might be generally useful to
decide which formal grammar it best (vaguely) resembles.

But in detail it turns out human language does not obey the rules of
any formal grammar at all.

It seems to be a bit like the way the output of a TV screen looks like
objects moving around in space. Yes, it looks like objects moving in
space. You might even generate a physics based on the objects which
appear to be there. It might work quite well until you came to Road
Runner cartoons. That doesn't mean the output of a TV screen is
actually objects moving around in space. If you insist on implementing
a TV screen as objects moving around in space, well, it might be a
puppet show similar enough to amuse the kids. But you won't make a TV
screen. You will always fail. And fail in ways very reminiscent of the
way formal grammars almost succeed... but fail, to represent human
language.

Same thing with a movie. Also looks a lot like objects moving around
on a screen. But is it objects moving on a screen? Different again.

Superficial forms do not always equate to mechanisms.

That's what's good about the HNet paper for me. It discusses how those
general forms might emerge from something else.

The history of AI in general, and natural language processing in
particular, has been a search for those elusive "grammars" we see
chasing around on the TV screens of our minds. And they all failed.
What has succeeded has been breaking the world into bits (pixels?) and
allowing them to come together in different ways. Then the game became
how to bring them together. Supervised "learning" spoon fed the
"objects" and bound the pixels together explicitly. Unsupervised
learning tried to resolve "objects" as some kind of similarity between
pixels. AI got a bump when, by surprise, letting the "objects" go
entirely turned out to generate text that was more natural than ever!
Who'd a thunk it? Letting "objects" go entirely works best! If it
hadn't been for the particular circumstances of language, pushing you
to a "prediction" conception of the problem, how long would it have
taken us to stumble on that? The downside to that was, letting
"objects" go entirely also doesn't totally fit with what we
experience. We do experience the world as "objects". And without those
"objects" at all, LLMs are kind of unhinged babblers.

So where's the right balance? Is the solution as LeCun, and perhaps
you, suggest (or Ben, looking for "semantic primitives" two years
ago...), to forget about the success LLMs had by letting go of objects
entirely. To repeat our earlier failures and seek the "objects"
elsewhere. Some other data. Physics? I see the objects, dammit! Look!
There's a coyote, and there's a road runner, and... Oh, my physics
didn't allow for that...

Or could it be the right balance is, yes, to ignore the exact
structure of the objects as LLMs have done, but no, not to do it as
LLMs do by totally ignoring "objects", but to ignore only the internal
structure of the "objects", by focusing on relations defining objects
in ways which allow their internal "pattern" to vary.

That's what I see being presented in the HNet paper. Maybe I'm getting
ahead of its authors. Because 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-24 Thread James Bowery
On Thu, May 23, 2024 at 9:19 PM Rob Freeman 
wrote:

> ...(Regarding the HNet paper)
> The ideas of relational category in that paper might really shift the
> needle for current language models.
>
> That as distinct from the older "grammar of mammalian brain capacity"
> paper, which I frankly think is likely a dead end.
>

Quoting the HNet paper:

> We conjecture that ongoing hierarchical construction of
> such entities can enable increasingly “symbol-like” repre-
> sentations, arising from lower-level “statistic-like” repre-
> sentations. Figure 9 illustrates construction of simple “face”
> configuration representations, from exemplars constructed
> within the CLEVR system consisting of very simple eyes,
> nose, mouth features. Categories (¢) and sequential rela-
> tions ($) exhibit full compositionality into sequential rela-
> tions of categories of sequential relations, etc.; these define
> formal grammars (Rodriguez & Granger 2016; Granger
> 2020). Exemplars (a,b) and near misses (c,d) are presented,
> initially yielding just instances, which are then greatly re-
> duced via abductive steps (see Supplemental Figure 13).

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mb30f879a8ccbe35506565e18
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-23 Thread Rob Freeman
James,

Not sure whether all that means you think category theory might be
useful for AI or not.

Anyway, I was moved to post those examples by Rich Hickey and Bartoz
Milewsky in my first post to this thread, by your comment that ideas
of indeterminate categories might annoy what you called 'the risible
tradition of so-called "type theories" in both mathematics and
programming languages'. I see the Hickey and Milewsky refs as examples
of ideas of indeterminate category entering computer programming
theory too.

Whether posted on the basis of a spurious connection or not, thanks
for the Granger HNet paper. That's maybe the most interesting paper
I've seen this year. As I say, it's the only reference I've seen other
than my own presenting the idea that relational categories liberate
category from any given pattern instantiating it. Which I see as
distinct from regression.

The ideas of relational category in that paper might really shift the
needle for current language models.

That as distinct from the older "grammar of mammalian brain capacity"
paper, which I frankly think is likely a dead end.

Real time "energy relaxation" finding new relational categories, as in
the Hamiltonian Net paper, is what I am pushing for. I see current
LLMs as incorporating a lot of that power by accident. But because
they still concentrate on the patterns, and not the relational
generating procedure, they do it only by becoming "large". We need to
understand the (relational) theory behind it in order to jump out of
the current LLM "local minimum".

On Thu, May 23, 2024 at 11:47 PM James Bowery  wrote:
>
>
> On Wed, May 22, 2024 at 10:34 PM Rob Freeman  
> wrote:
>>
>> On Wed, May 22, 2024 at 10:02 PM James Bowery  wrote:
>> > ...
>> > You correctly perceive that the symbolic regression presentation is not to 
>> > the point regarding the HNet paper.  A big failing of the symbolic 
>> > regression world is the same as it is in the rest of computerdom:  Failure 
>> > to recognize that functions are degenerate relations and you had damn well 
>> > better have thought about why you are degenerating when you do so.  But 
>> > likewise, when you are speaking about second-order theories (as opposed to 
>> > first-order theories), such as Category Theory, you had damn well have 
>> > thought about why you are specializing second-order predicate calculus 
>> > when you do so.
>> >
>> > Not being familiar with Category Theory I'm in no position to critique 
>> > this decision to specialize second-order predicate calculus.  I just 
>> > haven't seen Category Theory presented as a second-order theory.  Perhaps 
>> > I could understand Category Theory thence where the enthusiasm for 
>> > Category Theory comes from if someone did so.
>> >
>> > This is very much like my problem with the enthusiasm for type theories in 
>> > general.
>>
>> You seem to have an objection to second order predicate calculus.
>
>
> On the contrary; I see second order predicate calculus as foundational to any 
> attempt to deal with process which, in the classical case, is computation.
>
>> Dismissing category theory because you equate it to that. On what
>> basis do you equate them? Why do you reject second order predicate
>> calculus?
>
>
> I don't "dismiss" category theory.  It's just that I've never seen a category 
> theorist describe it as a second order theory.   Even in type theories 
> covering computation one finds such phenomena as the Wikipedia article on 
> "Type theory as a logic" lacking any reference to "second order".
>
> If I appear to "equate" category theory and second order predicate calculus 
> it is because category theory is a second order theory.  But beyond that, I 
> have an agenda related to Tom Etter's attempt to flesh out his theory of 
> "mind and matter" which I touched on in my first response to this thread 
> about fixing quantum logic.  An aspect of this project is the proof that 
> identity theory belongs to logic in the form of relative identity theory.  My 
> conjecture is that it ends up belonging to second order logic (predicate 
> calculus), which is why I resorted to Isabelle (HOL proof assistant).
>
>> What I like about category theory (as well as quantum formulations) is
>> that I see it as a movement away from definitions in terms of what
>> things are, and towards definitions in terms of how things are
>> related. Which fits with my observations of variation in objects
>> (grammar initially) defying definition, but being accessible to
>> definition in terms of relations.
>
>
> On this we heartily agree.  Why do you think first-order predicate calculus 
> is foundational to Codd's so-called "relational algebra"?  Why do you think 
> that "updates" aka "transactions" aka "atomic actions" are so problematic 
> within that first order theory?
>
>> > But I should also state that my motivation for investigating Granger et 
>> > al's approach to ML is based not the fact that it focuses on abduced 
>> > relations -- but on 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-23 Thread Quan Tesla
Rob

Yes, I understand the difference between a video and a paper. I did not
think John criticized it either. If he had, it's helathy to render
critique. You referenced the paper, and it remains relevant to the topic,
regardless of the medium accessed.

Indeed, I was referring to your decoupling point Rob. As you are well
aware, technically, there's a significant different between a relation and
an association. Relation implies functional dependency, whereas the paper
seems to indicate a method for relating an object to itself, and purely by
abstracted value, comparing it to any other object within its universe.
There are a significant number of inherent patterns right there.

I value your point on position-based categories. However, if these
positions are accurately plotted on a spacetime continuum, then we're
looking at a very-interesting approach to enabling category theory (and
like you I'm still learning) with accepted, quantum referencing within
specific, topological structures. Some of us think that's exactly how the
known universe achieves equilibrium.

No, I'm not equating 2nd order predicate calculus, or logic to category
theory. My point on "order" relates to the notion of emerging such from the
calculated values of objects (points) relative to other points, within x
space. This possibly may well offer a useful workaround for getting stuck
in Heisenberg's Uncertainty.  Once the positions of data-bearing objects
are known, in a sense it would negate the need to use globalization and
rather move to specifics. In my view, this would provide an environment in
which deabstraction and optimization would functionally fit into.

Again, I understand your preference for clarity on relationships, but
relationships change. My view would be more to get a valid and reliable fix
on compound relationships (in the sense of associations), where all
possible changes between objects are accommodated within hierarchies of
control. I think the paper made that point, probably using other terms. A
change in a historical relationship, wouldn't  necessarily have to mean
destroying all associated historical predictions (statistically calculate).
The approach in the paper seemingly allows for rapid reintegration without
loss of any data. I see pure data objects. Probably, because the robustness
of the system wouldn't get compromised if a point pseudo-randomly moved
from one position to another. Agreed, I also like category theory, but
probably for slightly different reasons to you. I'm biased towards
contexts.

There's more to "language" as I put it, than grammar. The paper did not
mention it as such.

Last, you asked:  "How do you relate their relational encoding to
regression?"
This is an excellent question. I don't quite know how they do their
relational encoding. As for regression, if I understand it correctly, it
relies on functional dependencies to emerge most-probable results, as
implied statistically.  What I think is that, with their way of setting up
each object as its own part/entity/element, they could probably relate
objects statistically by the degree of overall and core similarity. If so,
this would enable a fractal-relational principle for data cohesiveness. I
believe cohesiveness has again become all the rage.

To conclude, my excitement at what the paper contains is not for, or
against any theory. Theories are great. We read and think about them. Heck,
I even have a theory or two. Even so, theory must be tempered by practical
results. I'd put the results in the paper as having significant empirical
value. And exactly fo that reason, I could find practical value in it
beyond what was stated. Why not discuss it even further?

I concede that the conversation among you are quite theoretical. Even so,
we see what our eyes see. I see the paradigm shift in specifying a
practical, data approach to systematically converge all of engineering
towards quantum fundamentals. I remain an advocate for quantum engineering
methodologies and practices. It not only gives me hope that the road to
purely-machine-based AGI applications could be shortened significantly. It
also give me hope for commercialized products, such as "Engineering on a
chip".



On Thu, May 23, 2024 at 7:27 AM Rob Freeman 
wrote:

> On Thu, May 23, 2024 at 10:10 AM Quan Tesla  wrote:
> >
> > The paper is specific to a novel and quantitative approach and method
> for association in general and specifically.
>
> John was talking about the presentation James linked, not the paper,
> Quan. He may be right that in that presentation they use morphisms etc
> to map learned knowledge from one domain to another.
>
> He's not criticising the paper though. Only the presentation. And the
> two were discussing different techniques. John isn't criticising the
> Granger et al. "relational encoding" paper at all.
>
> > The persistence that pattern should be somehow decoupled doesn't make
> much sense to me. Information itself is as a result of pattern. Pattern is
> 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-23 Thread James Bowery
On Wed, May 22, 2024 at 10:34 PM Rob Freeman 
wrote:

> On Wed, May 22, 2024 at 10:02 PM James Bowery  wrote:
> > ...
> > You correctly perceive that the symbolic regression presentation is not
> to the point regarding the HNet paper.  A big failing of the symbolic
> regression world is the same as it is in the rest of computerdom:  Failure
> to recognize that functions are degenerate relations and you had damn well
> better have thought about why you are degenerating when you do so.  But
> likewise, when you are speaking about second-order theories (as opposed to
> first-order theories), such as Category Theory, you had damn well have
> thought about why you are specializing second-order predicate calculus when
> you do so.
> >
> > Not being familiar with Category Theory I'm in no position to critique
> this decision to specialize second-order predicate calculus.  I just
> haven't seen Category Theory presented as a second-order theory.  Perhaps I
> could understand Category Theory thence where the enthusiasm for Category
> Theory comes from if someone did so.
> >
> > This is very much like my problem with the enthusiasm for type theories
> in general.
>
> You seem to have an objection to second order predicate calculus.
>

On the contrary; I see second order predicate calculus as foundational to
any attempt to deal with process which, in the classical case, is
computation.

Dismissing category theory because you equate it to that. On what
> basis do you equate them? Why do you reject second order predicate
> calculus?
>

I don't "dismiss" category theory.  It's just that I've never seen a
category theorist describe it as a second order theory.   Even in type
theories covering computation one finds such phenomena as the Wikipedia
article on "Type theory as a logic"
 lacking
any reference to "second order".

If I appear to "equate" category theory and second order predicate calculus
it is because category theory is a second order theory
.  But beyond
that, I have an agenda related to Tom Etter's attempt to flesh out his
theory of "mind and matter" which I touched on in my first response to this
thread about fixing quantum logic.

An aspect of this project is the proof that identity theory belongs to
logic in the form of relative identity theory
.
My conjecture is that it ends up belonging to second order logic (predicate
calculus), which is why I resorted to Isabelle (HOL proof assistant)
.

What I like about category theory (as well as quantum formulations) is
> that I see it as a movement away from definitions in terms of what
> things are, and towards definitions in terms of how things are
> related. Which fits with my observations of variation in objects
> (grammar initially) defying definition, but being accessible to
> definition in terms of relations.
>

On this we heartily agree.  Why do you think first-order predicate calculus
is foundational to Codd's so-called "relational algebra"?  Why do you think
that "updates" aka "transactions" aka "atomic actions" are so problematic
within that *first* order theory?

> But I should also state that my motivation for investigating Granger et
> al's approach to ML is based not the fact that it focuses on abduced
> relations -- but on its basis in "The grammar of mammalian brain capacity"
> being a neglected order of grammar in the Chomsky Hierarchy: High Order
> Push Down Automata.  The fact that the HNet paper is about abduced
> relations was one of those serendipities that the prospector in me sees as
> a of gold in them thar HOPDAs.
>
> Where does the Granger Hamiltonian net paper mention "The grammar of
> mammalian brain capacity"? If it's not mentioned, how do you think
> they imply it?
>

My apologies for not providing the link to the paper by Granger and
Rodriguez:

https://arxiv.org/abs/1612.01150

> To wrap up, your definition of "regression" seems to differ from mine in
> the sense that, to me, "regression" is synonymous with data-driven modeling
> which is that aspect of learning, including machine learning, concerned
> with what IS as opposed to what OUGHT to be the case.
>
> The only time that paper mentions regression seems to indicate that
> they are also making a distinction between their relational encoding
> and regression:
>
> 'LLMs ... introduce sequential information supplementing the standard
> classification-based “isa” relation, although much of the information
> is learned via regression, and remains difficult to inspect or
> explain'
>
> How do you relate their relational encoding to regression?


Consider the 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-22 Thread Rob Freeman
On Wed, May 22, 2024 at 10:02 PM James Bowery  wrote:
> ...
> You correctly perceive that the symbolic regression presentation is not to 
> the point regarding the HNet paper.  A big failing of the symbolic regression 
> world is the same as it is in the rest of computerdom:  Failure to recognize 
> that functions are degenerate relations and you had damn well better have 
> thought about why you are degenerating when you do so.  But likewise, when 
> you are speaking about second-order theories (as opposed to first-order 
> theories), such as Category Theory, you had damn well have thought about why 
> you are specializing second-order predicate calculus when you do so.
>
> Not being familiar with Category Theory I'm in no position to critique this 
> decision to specialize second-order predicate calculus.  I just haven't seen 
> Category Theory presented as a second-order theory.  Perhaps I could 
> understand Category Theory thence where the enthusiasm for Category Theory 
> comes from if someone did so.
>
> This is very much like my problem with the enthusiasm for type theories in 
> general.

You seem to have an objection to second order predicate calculus.
Dismissing category theory because you equate it to that. On what
basis do you equate them? Why do you reject second order predicate
calculus?

What I like about category theory (as well as quantum formulations) is
that I see it as a movement away from definitions in terms of what
things are, and towards definitions in terms of how things are
related. Which fits with my observations of variation in objects
(grammar initially) defying definition, but being accessible to
definition in terms of relations.

> But I should also state that my motivation for investigating Granger et al's 
> approach to ML is based not the fact that it focuses on abduced relations -- 
> but on its basis in "The grammar of mammalian brain capacity" being a 
> neglected order of grammar in the Chomsky Hierarchy: High Order Push Down 
> Automata.  The fact that the HNet paper is about abduced relations was one of 
> those serendipities that the prospector in me sees as a of gold in them thar 
> HOPDAs.

Where does the Granger Hamiltonian net paper mention "The grammar of
mammalian brain capacity"? If it's not mentioned, how do you think
they imply it?

> To wrap up, your definition of "regression" seems to differ from mine in the 
> sense that, to me, "regression" is synonymous with data-driven modeling which 
> is that aspect of learning, including machine learning, concerned with what 
> IS as opposed to what OUGHT to be the case.

The only time that paper mentions regression seems to indicate that
they are also making a distinction between their relational encoding
and regression:

'LLMs ... introduce sequential information supplementing the standard
classification-based “isa” relation, although much of the information
is learned via regression, and remains difficult to inspect or
explain'

How do you relate their relational encoding to regression?

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M2f9210fa34834e5bb8e46d0c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-22 Thread Rob Freeman
On Thu, May 23, 2024 at 10:10 AM Quan Tesla  wrote:
>
> The paper is specific to a novel and quantitative approach and method for 
> association in general and specifically.

John was talking about the presentation James linked, not the paper,
Quan. He may be right that in that presentation they use morphisms etc
to map learned knowledge from one domain to another.

He's not criticising the paper though. Only the presentation. And the
two were discussing different techniques. John isn't criticising the
Granger et al. "relational encoding" paper at all.

> The persistence that pattern should be somehow decoupled doesn't make much 
> sense to me. Information itself is as a result of pattern. Pattern is 
> everything. Light itself is a pattern, so are the four forces. Ergo.  I 
> suppose, it depends on how you view it.

If you're questioning my point, it is that definition in terms of
relations means the pattern can vary. It's like the gap filler example
in the paper:

"If John kissed Mary, Bill kissed Mary, and Hal kissed Mary, etc.,
then a novel category ¢X can be abduced such that ¢X kissed Mary.
Importantly, the new entity ¢X is not a category based on the features
of the members of the category, let alone the similarity of such
features. I.e., it is not a statistical cluster in any usual sense.
Rather, it is a “position-based category,” signifying entities that
stand in a fixed relation with other entities. John, Bill, Hal may not
resemble each other in any way, other than being entities that all
kissed Mary. Position based categories (PBCs) thus fundamentally
differ from “isa” categories, which can be similarity-based (in
unsupervised systems) or outcome-based (in supervised systems)."

If you define your category on the basis of kissing Mary, then who's
to say that you might not find other people who have kissed Mary, and
change your category from moment to moment. As you discovered clusters
of former lovers by fits and starts, the actual pattern of your
"category" might change dramatically. But it would still be defined by
its defining relation of having kissed Mary.

That might also talk to the "regression" distinction. Or
characterizing the system, or indeed all cognition, as "learning"
period. It elides both "similarity-based" unsupervised, and
supervised, "learning". The category can in fact grow as you "learn"
of new lovers. A process which I also have difficulty equating with
regression.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M8c58bf8eb0a279da79ea34eb
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-22 Thread Quan Tesla
The paper is specific to a novel and quantitative approach and method for
association in general and specifically.

 It emerges possible and statistical (most correct)  relationships. This
stands in stark contrast to  the deterministic commitment to construct
functional relationships. Hence, a polymorphic feature is enabled.
Constructors are implied in the design. How else?

 Further, this paper opens the door to computational entanglement and auto
optimization. Mathematically, a control hierarchy could relatively simply
be set at any order of logic. Thus, it has a scalar feature (deabstraction
readiness).

Rather than  classification dependent, categorization may become fully
enabled. Probably, information and semantics would be contextually enabled
across any number of universes. This paper doesn't venture into all the
implications, which in all fairness justifies scepticism.

The persistence that pattern should be somehow decoupled doesn't make much
sense to me. Information itself is as a result of pattern. Pattern is
everything. Light itself is a pattern, so are the four forces. Ergo.  I
suppose, it depends on how you view it.

Here, we have a "language" in which to emerge and initiate any pattern, to
bring form2function2form (circular, yet  progressive chain reactions). I
think it qualifies the design as having the potential to become fully
recursive. We'll have to wait and see.

For now, I'll contend that 'Design' (as pattern application/architectural
principles) remains key.







On Wed, May 22, 2024, 15:01 John Rose  wrote:

> On Tuesday, May 21, 2024, at 10:34 PM, Rob Freeman wrote:
>
> Unless I've missed something in that presentation. Is there anywhere in
> the hour long presentation where they address a decoupling of category from
> pattern, and the implications of this for novelty of structure?
>
>
> I didn’t watch the video but isn’t this just morphisms and functors so you
> can map ML between knowledge domains. Some may need to be fuzzy and the
> best structure I’ve found is Smarandache’s neutrosphic...So a generalized
> intelligence will manage sets of various morphisms across N domains. For
> example, if an AI that knows how to drive a car attempts to build a
> birdhouse it takes a small subset of morphisms between the two but grows
> more towards the birdhouse. As it attempts to build the birdhouse there
> actually may be some morphismic structure that apply to driving a car but
> most will be utilized and grow one way… N morphisms for example epi, mono,
> homo, homeo, endo, auto, zero, etc. and most obvious iso. Another mapping
> from car driving to motorcycle driving would have more utilizable
> morphisms… like steering wheel to handlebars… there is some symmetry
> mapping between group operations but they are not full iso. The pattern
> recognition is morphism recognition and novelty is created from
> mathematical structure manipulation across knowledge domains. This works
> very well when building new molecules since there are tight, almost
> lossless IOW iso morphismic relationships.
>
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M34b70925a493b96ae5ccdf6f
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-22 Thread James Bowery
On Tue, May 21, 2024 at 9:35 PM Rob Freeman 
wrote:

> 
>
> Whereas the NN presentation is talking about NNs regressing to fixed
> encodings. Not about an operator which "calculates energies" in real
> time.
>
> Unless I've missed something in that presentation. Is there anywhere
> in the hour long presentation where they address a decoupling of
> category from pattern, and the implications of this for novelty of
> structure?
>

You correctly perceive that the symbolic regression presentation is not to
the point regarding the HNet paper.  A big failing of the symbolic
regression world is the same as it is in the rest of computerdom:  Failure
to recognize that functions are degenerate relations and you had damn well
better have thought about why you are degenerating when you do so.  But
likewise, when you are speaking about second-order theories (as
opposed to first-order
theories ),
such as Category Theory, you had damn well have thought about why you are
*specializing* second-order predicate calculus when you do so.

Not being familiar with Category Theory I'm in no position to critique this
decision to specialize second-order predicate calculus.  I just haven't
seen Category Theory presented *as* a second-order theory.  Perhaps I could
understand Category Theory thence where the enthusiasm for Category Theory
comes from if someone did so.

This is very much like my problem with the enthusiasm for type theories in
general.

But I should also state that my motivation for investigating Granger et
al's approach to ML is based *not* the fact that it focuses on abduced
*relations* -- but on its basis in "The grammar of mammalian brain
capacity" being a neglected order of grammar in the Chomsky Hierarchy: High
Order Push Down Automata.  The fact that the HNet paper is about abduced
*relations* was one of those serendipities that the prospector in me sees
as a of gold in them thar HOPDAs.

To wrap up, your definition of "regression" seems to differ from mine in
the sense that, to me, "regression" is synonymous with data-driven modeling
which is that aspect of learning, including machine learning, concerned
with what IS as opposed to what OUGHT to be the case.


>
> On Tue, May 21, 2024 at 11:36 PM James Bowery  wrote:
> >
> > Symbolic Regression is starting to catch on but, as usual, people aren't
> using the Algorithmic Information Criterion so they end up with
> unprincipled choices on the Pareto frontier between residuals and model
> complexity if not unprincipled choices about how to weight the complexity
> of various "nodes" in the model's "expression".
> >
> > https://youtu.be/fk2r8y5TfNY
> >
> > A node's complexity is how much machine language code it takes to
> implement it on a CPU-only implementation.  Error residuals are program
> literals aka "constants".
> >
> > I don't know how many times I'm going to have to point this out to
> people before it gets through to them (probably well beyond the time
> maggots have forgotten what I tasted like) .

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mac2ae2959e680fe509d66197
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-22 Thread John Rose
On Tuesday, May 21, 2024, at 10:34 PM, Rob Freeman wrote:
> Unless I've missed something in that presentation. Is there anywhere
in the hour long presentation where they address a decoupling of
category from pattern, and the implications of this for novelty of
structure?

I didn’t watch the video but isn’t this just morphisms and functors so you can 
map ML between knowledge domains. Some may need to be fuzzy and the best 
structure I’ve found is Smarandache’s neutrosphic...So a generalized 
intelligence will manage sets of various morphisms across N domains. For 
example, if an AI that knows how to drive a car attempts to build a birdhouse 
it takes a small subset of morphisms between the two but grows more towards the 
birdhouse. As it attempts to build the birdhouse there actually may be some 
morphismic structure that apply to driving a car but most will be utilized and 
grow one way… N morphisms for example epi, mono, homo, homeo, endo, auto, zero, 
etc. and most obvious iso. Another mapping from car driving to motorcycle 
driving would have more utilizable morphisms… like steering wheel to 
handlebars… there is some symmetry mapping between group operations but they 
are not full iso. The pattern recognition is morphism recognition and novelty 
is created from mathematical structure manipulation across knowledge domains. 
This works very well when building new molecules since there are tight, almost 
lossless IOW iso morphismic relationships.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Me455a509be8e5e3671c3b5e0
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-21 Thread Quan Tesla
Thanks for sharing this paper.

Positively brilliant! I think this is in-line with quantum thinking and
holds great promise for quantum computing. It relates to a concept advanced
by myself and my mentor, namely, gestalt management. Penultimately, we
endeavor to most-correctly represent relativistic, multiversal realities.

This work increases the probability of success of significant value to my
Po1 theory. The day would come, where emergent requirements for locating
"needles in data haystacks", near instantaneously, would place an
unrelenting demand on these types of networks. I think this type of
architecture - when fully matured - would be perfectly suited for that.

On Wed, May 22, 2024 at 6:35 AM Rob Freeman 
wrote:

> James,
>
> The Hamiltonian paper was nice for identifying gap filler tasks as
> decoupling meaning from pattern: "not a category based on the features
> of the members of the category, let alone the similarity of such
> features".
>
> Here, for anyone else:
>
> A logical re-conception of neural networks: Hamiltonian bitwise
> part-whole architecture
> E.F.W.Bowen,1 R.Granger,2* A.Rodriguez
> https://openreview.net/pdf?id=hP4dxXvvNc8
>
> "Part-whole architecture". A new thing. Though they 'share some
> characteristics with “embeddings” in transformer architectures'.
>
> So it's a possible alternate reason for the surprise success of
> transformers. That's good. The field blunders about surprising itself.
> But there's no theory behind it. Transformers just stumbled into
> embedding representations because they looked at language. We need to
> start thinking about why these things work. Instead of just blithely
> talking about the miracle of more data. Disingenuously scaring the
> world with idiotic fears about "more data" becoming conscious by
> accident. Or insisting like LeCun that the secret is different data.
>
> But I think you're missing the point of that Hamiltonian paper if you
> think this decoupling of meaning from pattern is regression. I think
> the point of this, and also the category theoretic representations of
> Symbolica, and also quantum mechanical formalizations, is
> indeterminate symbolization, even novelty.
>
> Yeah, maybe regression will work for some things. But that ain't
> language. And it ain't cognition. They are more aligned with a
> different "New Kind of Science", that touted by Wolfram, new
> structure, all the time. Not regression, going backward, but novelty,
> creativity.
>
> In my understanding the point with the Hamiltonian paper is that a
> "position-based encoding" decouples meaning from any given pattern
> which instantiates it.
>
> Whereas the NN presentation is talking about NNs regressing to fixed
> encodings. Not about an operator which "calculates energies" in real
> time.
>
> Unless I've missed something in that presentation. Is there anywhere
> in the hour long presentation where they address a decoupling of
> category from pattern, and the implications of this for novelty of
> structure?
>
> On Tue, May 21, 2024 at 11:36 PM James Bowery  wrote:
> >
> > Symbolic Regression is starting to catch on but, as usual, people aren't
> using the Algorithmic Information Criterion so they end up with
> unprincipled choices on the Pareto frontier between residuals and model
> complexity if not unprincipled choices about how to weight the complexity
> of various "nodes" in the model's "expression".
> >
> > https://youtu.be/fk2r8y5TfNY
> >
> > A node's complexity is how much machine language code it takes to
> implement it on a CPU-only implementation.  Error residuals are program
> literals aka "constants".
> >
> > I don't know how many times I'm going to have to point this out to
> people before it gets through to them (probably well beyond the time
> maggots have forgotten what I tasted like) .

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M24e8a3387f6852d9e8287be3
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-21 Thread Rob Freeman
James,

The Hamiltonian paper was nice for identifying gap filler tasks as
decoupling meaning from pattern: "not a category based on the features
of the members of the category, let alone the similarity of such
features".

Here, for anyone else:

A logical re-conception of neural networks: Hamiltonian bitwise
part-whole architecture
E.F.W.Bowen,1 R.Granger,2* A.Rodriguez
https://openreview.net/pdf?id=hP4dxXvvNc8

"Part-whole architecture". A new thing. Though they 'share some
characteristics with “embeddings” in transformer architectures'.

So it's a possible alternate reason for the surprise success of
transformers. That's good. The field blunders about surprising itself.
But there's no theory behind it. Transformers just stumbled into
embedding representations because they looked at language. We need to
start thinking about why these things work. Instead of just blithely
talking about the miracle of more data. Disingenuously scaring the
world with idiotic fears about "more data" becoming conscious by
accident. Or insisting like LeCun that the secret is different data.

But I think you're missing the point of that Hamiltonian paper if you
think this decoupling of meaning from pattern is regression. I think
the point of this, and also the category theoretic representations of
Symbolica, and also quantum mechanical formalizations, is
indeterminate symbolization, even novelty.

Yeah, maybe regression will work for some things. But that ain't
language. And it ain't cognition. They are more aligned with a
different "New Kind of Science", that touted by Wolfram, new
structure, all the time. Not regression, going backward, but novelty,
creativity.

In my understanding the point with the Hamiltonian paper is that a
"position-based encoding" decouples meaning from any given pattern
which instantiates it.

Whereas the NN presentation is talking about NNs regressing to fixed
encodings. Not about an operator which "calculates energies" in real
time.

Unless I've missed something in that presentation. Is there anywhere
in the hour long presentation where they address a decoupling of
category from pattern, and the implications of this for novelty of
structure?

On Tue, May 21, 2024 at 11:36 PM James Bowery  wrote:
>
> Symbolic Regression is starting to catch on but, as usual, people aren't 
> using the Algorithmic Information Criterion so they end up with unprincipled 
> choices on the Pareto frontier between residuals and model complexity if not 
> unprincipled choices about how to weight the complexity of various "nodes" in 
> the model's "expression".
>
> https://youtu.be/fk2r8y5TfNY
>
> A node's complexity is how much machine language code it takes to implement 
> it on a CPU-only implementation.  Error residuals are program literals aka 
> "constants".
>
> I don't know how many times I'm going to have to point this out to people 
> before it gets through to them (probably well beyond the time maggots have 
> forgotten what I tasted like) .

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M8418e9bd5e49f7ca08dfb816
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-21 Thread James Bowery
Symbolic Regression is starting to catch on but, as usual, people aren't
using the Algorithmic Information Criterion so they end up with
unprincipled choices on the Pareto frontier between residuals and model
complexity if not unprincipled choices about how to weight the complexity
of various "nodes" in the model's "expression".

https://youtu.be/fk2r8y5TfNY

A node's complexity is how much machine language code it takes to implement
it on a CPU-only implementation.  Error residuals are program literals aka
"constants".

I don't know how many times I'm going to have to point this out to people
before it gets through to them (probably well beyond the time maggots have
forgotten what I tasted like) .

On Mon, May 20, 2024 at 10:23 PM Rob Freeman 
wrote:

> "Importantly, the new entity ¢X is not a category based on the
> features of the members of the category, let alone the similarity of
> such features"
>
> Oh, nice. I hadn't seen anyone else making that point. This paper 2023?
>
> That's what I was saying. Nice. A vindication. Such categories
> decouple the pattern itself from the category.
>
> But I'm astonished they don't cite Coecke, as the obvious quantum
> formulation precedent (though I noticed it for language in the '90s.)
>
> I wonder how their formulation relates to what Symbolica are doing
> with their category theoretic formulations:
>
> https://youtu.be/rie-9AEhYdY?si=9RUB3O_8WeFSU3ni
>
> I haven't read closely enough to know if they make that decoupling of
> category from pattern a sense for "creativity" the way I'm suggesting.
> Perhaps that's because a Hamiltonian formulation is still too trapped
> in symbolism. We need to remain trapped in the symbolism for physics.
> Because for physics we don't have access to an underlying reality.
> That's where AI, and particularly language, has an advantage. Because,
> especially for language, the underlying reality of text is the only
> reality we do have access to (though Chomsky tried to swap that
> around, and insist we only access our cognitive insight.)
>
> For AI, and especially for language, we have the opportunity to get
> under even a quantum formalism. It will be there implicitly, but
> instead of laboriously formulating it, and then collapsing it at run
> time, we can simply "collapse" structure directly from observation.
> But that "collapse" must be flexible, and allow different structures
> to arise from different symmetries found in the data from moment to
> moment. So it requires the abandonment of back-prop.
>
> In theory it is easy though. Everything can remain much as it is for
> LLMs. Only, instead of trying to "learn" stable patterns using
> back-prop, we must "collapse" different symmetries in the data in
> response to a different "prompt", at run time.
>
> On Tue, May 21, 2024 at 5:01 AM James Bowery  wrote:
> >
> > From A logical re-conception of neural networks: Hamiltonian bitwise
> part-whole architecture
> >> From hierarchical statistics to abduced symbols
> >> It is perhaps useful to envision some of the ongoing devel-
> >> opments that are arising from enlarging and elaborating the
> >> Hamiltonian logic net architecture. As yet, no large-scale
> >> training whatsoever has gone into the present minimal HNet
> >> model; thus far it is solely implemented at a small, introduc-
> >> tory scale, as an experimental new approach to representa-
> >> tions. It is conjectured that with large-scale training, hierar-
> >> chical constructs would be accreted as in large deep network
> >> systems, with the key difference that, in HNets, such con-
> >> structs would have relational properties beyond the “isa”
> >> (category) relation, as discussed earlier.
> >> Such relational representations lend themselves to abduc-
> >> tive steps (McDermott 1987) (or “retroductive” (Pierce
> >> 1883)); i.e., inferential generalization steps that go beyond
> >> warranted statistical information. If John kissed Mary, Bill
> >> kissed Mary, and Hal kissed Mary, etc., then a novel cate-
> >> gory ¢X can be abduced such that ¢X kissed Mary.
> >> Importantly, the new entity ¢X is not a category based on
> >> the features of the members of the category, let alone the
> >> similarity of such features. I.e., it is not a statistical cluster
> >> in any usual sense. Rather, it is a “position-based category,”
> >> signifying entities that stand in a fixed relation with other
> >> entities. John, Bill, Hal may not resemble each other in any
> >> way, other than being entities that all kissed Mary. Position-
> >> based categories (PBCs) thus fundamentally differ from
> >> “isa” categories, which can be similarity-based (in unsuper-
> >> vised systems) or outcome-based (in supervised systems).
> >> PBCs share some characteristics with “embeddings” in
> >> transformer architectures.
> >> Abducing a category of this kind often entails overgener-
> >> alization, and subsequent learning may require learned ex-
> >> ceptions to the overgeneralization. (Verb past 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread Rob Freeman
"Importantly, the new entity ¢X is not a category based on the
features of the members of the category, let alone the similarity of
such features"

Oh, nice. I hadn't seen anyone else making that point. This paper 2023?

That's what I was saying. Nice. A vindication. Such categories
decouple the pattern itself from the category.

But I'm astonished they don't cite Coecke, as the obvious quantum
formulation precedent (though I noticed it for language in the '90s.)

I wonder how their formulation relates to what Symbolica are doing
with their category theoretic formulations:

https://youtu.be/rie-9AEhYdY?si=9RUB3O_8WeFSU3ni

I haven't read closely enough to know if they make that decoupling of
category from pattern a sense for "creativity" the way I'm suggesting.
Perhaps that's because a Hamiltonian formulation is still too trapped
in symbolism. We need to remain trapped in the symbolism for physics.
Because for physics we don't have access to an underlying reality.
That's where AI, and particularly language, has an advantage. Because,
especially for language, the underlying reality of text is the only
reality we do have access to (though Chomsky tried to swap that
around, and insist we only access our cognitive insight.)

For AI, and especially for language, we have the opportunity to get
under even a quantum formalism. It will be there implicitly, but
instead of laboriously formulating it, and then collapsing it at run
time, we can simply "collapse" structure directly from observation.
But that "collapse" must be flexible, and allow different structures
to arise from different symmetries found in the data from moment to
moment. So it requires the abandonment of back-prop.

In theory it is easy though. Everything can remain much as it is for
LLMs. Only, instead of trying to "learn" stable patterns using
back-prop, we must "collapse" different symmetries in the data in
response to a different "prompt", at run time.

On Tue, May 21, 2024 at 5:01 AM James Bowery  wrote:
>
> From A logical re-conception of neural networks: Hamiltonian bitwise 
> part-whole architecture
>
>> From hierarchical statistics to abduced symbols
>> It is perhaps useful to envision some of the ongoing devel-
>> opments that are arising from enlarging and elaborating the
>> Hamiltonian logic net architecture. As yet, no large-scale
>> training whatsoever has gone into the present minimal HNet
>> model; thus far it is solely implemented at a small, introduc-
>> tory scale, as an experimental new approach to representa-
>> tions. It is conjectured that with large-scale training, hierar-
>> chical constructs would be accreted as in large deep network
>> systems, with the key difference that, in HNets, such con-
>> structs would have relational properties beyond the “isa”
>> (category) relation, as discussed earlier.
>> Such relational representations lend themselves to abduc-
>> tive steps (McDermott 1987) (or “retroductive” (Pierce
>> 1883)); i.e., inferential generalization steps that go beyond
>> warranted statistical information. If John kissed Mary, Bill
>> kissed Mary, and Hal kissed Mary, etc., then a novel cate-
>> gory ¢X can be abduced such that ¢X kissed Mary.
>> Importantly, the new entity ¢X is not a category based on
>> the features of the members of the category, let alone the
>> similarity of such features. I.e., it is not a statistical cluster
>> in any usual sense. Rather, it is a “position-based category,”
>> signifying entities that stand in a fixed relation with other
>> entities. John, Bill, Hal may not resemble each other in any
>> way, other than being entities that all kissed Mary. Position-
>> based categories (PBCs) thus fundamentally differ from
>> “isa” categories, which can be similarity-based (in unsuper-
>> vised systems) or outcome-based (in supervised systems).
>> PBCs share some characteristics with “embeddings” in
>> transformer architectures.
>> Abducing a category of this kind often entails overgener-
>> alization, and subsequent learning may require learned ex-
>> ceptions to the overgeneralization. (Verb past tenses typi-
>> cally are formed by appending “-ed”, and a language learner
>> may initially overgeneralize to “runned” and “gived,” neces-
>> sitating subsequent exception learning of “ran” and “gave”.)
>
>
> The abduced "category" ¢X bears some resemblance to the way Currying (as in 
> combinator calculus) binds a parameter of a symbol to define a new symbol.  
> In practice it only makes sense to bother creating this new symbol if it, in 
> concert with all other symbols, compresses the data in evidence.  (As for 
> "overgeneralization", that applies to any error in prediction encountered 
> during learning and, in the ideal compressor, increases the algorithm's 
> length even if only by appending the exceptional data in a conditional -- NOT 
> "falsifying" anything as would that rascal Popper).
>
> This is "related" to quantum-logic in the sense that Tom Etter calls out in 
> the linked 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread keghnfeem
Tokens inside transformers are supervised internal symbols.
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M102516027fd65ca8c1f90b8b
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread James Bowery
From
*A logical re-conception of neural networks: Hamiltonian bitwise part-whole
architecture*


> *From hierarchical statistics to abduced symbols*It is perhaps useful to
> envision some of the ongoing devel-
> opments that are arising from enlarging and elaborating the
> Hamiltonian logic net architecture. As yet, no large-scale
> training whatsoever has gone into the present minimal HNet
> model; thus far it is solely implemented at a small, introduc-
> tory scale, as an experimental new approach to representa-
> tions. It is conjectured that with large-scale training, hierar-
> chical constructs would be accreted as in large deep network
> systems, with
> *the key difference that, in HNets, such con-structs would have relational
> properties* beyond the “isa”
> (category) relation, as discussed earlier.
> Such relational representations lend themselves to abduc-
> tive steps (McDermott 1987) (or “retroductive” (Pierce
> 1883)); i.e., inferential generalization steps that go beyond
> warranted statistical information. If John kissed Mary, Bill
> kissed Mary, and Hal kissed Mary, etc., then a novel cate-
> gory ¢X can be abduced such that ¢X kissed Mary.
> Importantly, the new entity ¢X is not a category based on
> the features of the members of the category, let alone the
> similarity of such features. I.e., it is not a statistical cluster
> in any usual sense. Rather, it is a “position-based category,”
> signifying entities that stand in a fixed relation with other
> entities. John, Bill, Hal may not resemble each other in any
> way, other than being entities that all kissed Mary. Position-
> based categories (PBCs) thus fundamentally differ from
> “isa” categories, which can be similarity-based (in unsuper-
> vised systems) or outcome-based (in supervised systems).
> PBCs share some characteristics with “embeddings” in
> transformer architectures.
> Abducing a category of this kind often entails overgener-
> alization, and subsequent learning may require learned ex-
> ceptions to the overgeneralization. (Verb past tenses typi-
> cally are formed by appending “-ed”, and a language learner
> may initially overgeneralize to “runned” and “gived,” neces-
> sitating subsequent exception learning of “ran” and “gave”.)


The abduced "category" ¢X bears some resemblance to the way Currying
(as in combinator
calculus ) binds a
parameter of a symbol to define a new symbol.  In practice it only makes
sense to bother creating this new symbol if it, in concert with all other
symbols, compresses the data in evidence.  (As for "overgeneralization",
that applies to any error in prediction encountered during learning and, in
the ideal compressor, increases the algorithm's length even if only by
appending the exceptional data in a conditional -- *NOT* "falsifying"
anything as would that rascal Popper).

This is "related" to quantum-logic in the sense that Tom Etter calls out in
the linked presentation:

Digram box linking, which is based on the *mathematics of relations
> rather than of functions*, is a more general operation than the
> composition of transition matrices.


On Thu, May 16, 2024 at 7:24 PM James Bowery  wrote:

> First, fix quantum logic:
>
>
> https://web.archive.org/web/20061030044246/http://www.boundaryinstitute.org/articles/Dynamical_Markov.pdf
>
> Then realize that empirically true cases can occur not only in
> multiplicity (OR), but with structure that includes the simultaneous (AND)
> measurement dimensions of those cases.
>
> But don't tell anyone because it might obviate the risible tradition of
> so-called "type theories" in both mathematics and programming languages
> (including SQL and all those "fuzzy logic" kludges) and people would get
> *really* pissy at you.
>
>
> On Thu, May 16, 2024 at 10:27 AM  wrote:
>
>> What should symbolic approach include to entirely replace neural networks
>> approach in creating true AI? Is that task even possible? What benefits and
>> drawbacks we could expect or hope for if it is possible? If it is not
>> possible, what would be the reasons?
>>
>> Thank you all for your time.
>> *Artificial General Intelligence List *
>> / AGI / see discussions  +
>> participants  +
>> delivery options 
>> Permalink
>> 
>>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Ma9215f03be1998269e14f977
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread James Bowery
On Mon, May 20, 2024 at 9:49 AM Rob Freeman 
wrote:

> Well, I don't know number theory well, but what axiomatization of
> maths are you basing the predictions in your series on?
>
> I have a hunch the distinction I am making is similar to a distinction
> about the choice of axiomatization. Which will be random. (The
> randomness demonstrated by Goedel's diagonalization lemma? "True" but
> not provable/predictable within the system?)
>

Here's how I tend to think about it:

Solomonoff addressed this "random" choice of axioms by introducing a random
bit string (the axioms of the theory) interpreted as an algorithm (rules of
inference) which, itself, produces another bit string (theorems).

However, this leaves undefined the "rules of inference" which, in my way of
thinking, is like leaving undefined the choice of UTM within Algorithmic
Information Theory.

I've addressed this before in terms of the axioms of arithmetic by saying
that the choice of UTM is no more "random" than is the choice of axioms of
arithmetic which must, itself, incorporate the rules of inference else you
have no theory.

Marcus Hutter has addressed this "philosophical nuisance" in terms of no
post hoc (after observing the dataset) choice of UTM being permitted by the
principles of prediction.

I've further addressed this philosophical nuisance by permitting the
sophist to examine the dataset prior to "choosing the UTM", but restricted
to NiNOR Complexity
 which
further reduces the argument surface available to sophists.


> On Mon, May 20, 2024 at 9:09 PM James Bowery  wrote:
> >
> >
> >
> > On Sun, May 19, 2024 at 11:32 PM Rob Freeman 
> wrote:
> >>
> >> James,
> >>
> >> My working definition of "truth" is a pattern that predicts. And I'm
> >> tending away from compression for that.
> >
> >
> > 2, 4, 6, 8
> >
> > does it mean
> > 2n?
> >
> > or does it mean
> > 10?
> >
> >
> >
> >> Related to your sense of "meaning" in (Algorithmic Information)
> >> randomness. But perhaps not quite the same thing.
> >
> >
> > or does it mean a probability distribution of formulae that all produce
> 2, 4, 6, 8 whatever they may subsequently produce?
> >
> > or does it mean a probability distribution of sequences
> > 10, 12?
> > 10, 12, 14?
> > 10, 13, 14?
> > ...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M1ce471d20cc6a3bfdec9f397
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread Rob Freeman
Well, I don't know number theory well, but what axiomatization of
maths are you basing the predictions in your series on?

I have a hunch the distinction I am making is similar to a distinction
about the choice of axiomatization. Which will be random. (The
randomness demonstrated by Goedel's diagonalization lemma? "True" but
not provable/predictable within the system?)

On Mon, May 20, 2024 at 9:09 PM James Bowery  wrote:
>
>
>
> On Sun, May 19, 2024 at 11:32 PM Rob Freeman  
> wrote:
>>
>> James,
>>
>> My working definition of "truth" is a pattern that predicts. And I'm
>> tending away from compression for that.
>
>
> 2, 4, 6, 8
>
> does it mean
> 2n?
>
> or does it mean
> 10?
>
>
>
>> Related to your sense of "meaning" in (Algorithmic Information)
>> randomness. But perhaps not quite the same thing.
>
>
> or does it mean a probability distribution of formulae that all produce 2, 4, 
> 6, 8 whatever they may subsequently produce?
>
> or does it mean a probability distribution of sequences
> 10, 12?
> 10, 12, 14?
> 10, 13, 14?
> ...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M086013ed4b196bdfe9a874c8
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread James Bowery
On Sun, May 19, 2024 at 11:32 PM Rob Freeman 
wrote:

> James,
>
> My working definition of "truth" is a pattern that predicts. And I'm
> tending away from compression for that.
>

2, 4, 6, 8

does it mean
2n?

or does it mean
10?



Related to your sense of "meaning" in (Algorithmic Information)
> randomness. But perhaps not quite the same thing.
>

or does it mean a probability distribution of formulae that all produce 2,
4, 6, 8 whatever they may subsequently produce?

or does it mean a probability distribution of sequences
10, 12?
10, 12, 14?
10, 13, 14?
...



> I want to emphasise a sense in which "meaning" is an expansion of the
> world, not a compression. By expansion I mean more than one,
> contradictory, predictive pattern from a single set of data.
>

I hope you can see from the above questions that we are talking about
probability distributions.  What is the difference between the probability
distribution of algorithms (aka formulae) and the probability distribution
of the strings they generate?


> Note I'm saying a predictive pattern, not a predictable pattern.
> (Perhaps as a random distribution of billiard balls might predict the
> evolution of the table, without being predictable itself?)
>
> There's randomness at the heart of that. Contradictory patterns
> require randomness. A single, predictable, pattern, could not have
> contradictory predictive patterns either? But I see the meaning coming
> from the prediction, not any random pattern that may be making the
> prediction.
>
> Making meaning about prediction, and not any specific pattern itself,
> opens the door to patterns which are meaningful even though new. Which
> can be a sense for creativity.
>
> Anyway, the "creative" aspect of it would explain why LLMs get so big,
> and don't show any interpretable structure.
>
> With a nod to the topic of this thread, it would also explain why
> symbolic systems would never be adequate. It would undermine the idea
> of stable symbols, anyway.
>
> So, not consensus through a single, stable, Algorithmic Information
> most compressed pattern, as I understand you are suggesting (the most
> compressed pattern not knowable anyway?) Though dependent on
> randomness, and consistent with your statement that "truth" should be
> "relative to a given set of observations".
>
> On Sat, May 18, 2024 at 11:57 PM James Bowery  wrote:
> >
> > Rob, the problem I have with things like "type theory" and "category
> theory" is that they almost always elide their foundation in HOL (high
> order logic) which means they don't really admit that they are syntactic
> sugars for second-order predicate calculus.  The reason I describe this as
> "risible" is the same reason I rather insist on the Algorithmic Information
> Criterion for model selection in the natural sciences:
> >
> > Reduce the argument surface that has us all going into hysterics over
> "truth" aka "the science" aka what IS the case as opposed to what OUGHT to
> be the case.
> >
> > Note I said "reduce" rather than "eliminate" the argument surface.  All
> I'm trying to do is get people to recognize that relative to a given set of
> observations the Algorithmic Information Criterion is the best operational
> definition of the truth.
> >
> > It's really hard for people to take even this baby step toward standing
> down from killing each other in a rhyme with The Thirty Years War, given
> that social policy is so centralized that everyone must become a de facto
> theocratic supremacist as a matter of self defence.  It's really obvious
> that the trend is toward capturing us in a control system, e.g. a
> Valley-Girl flirtation friendly interface to Silicon Chutulu that can only
> be fought at the physical level such as sniper bullets through the cooling
> systems of data centers.  This would probably take down civilization itself
> given the over-emphasis on efficiency vs resilience in civilization's
> dependence on information systems infrastructure.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Me2c000d7572de5b0a5769775
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread John Rose
On Saturday, May 18, 2024, at 6:53 PM, Matt Mahoney wrote:
> Surely you are aware of the 100% failure rate of symbolic AI over the last 70 
> years? It should work in theory, but we have a long history of 
> underestimating the cost, lured by the early false success of covering half 
> of the cases with just a few hundred rules.
> 

I view LLM’s as systems within symbolic systems. Why? Simply that we exist in a 
spacetime environment and ALL COMMUNICATION is symbolic. And sub-symbolic 
representation is required for computation. All bits are symbols based on 
probabilities. Then as LLM’s become more intelligent the physical power 
consumption required to produce similar results will decrease as their symbolic 
networks grow and optimize.

Could be wrong but It makes sense to me… saying everything is symbolic 
eliminates the argument. I know it's lazy but  that's often how developers look 
at things in order to code them up :) Laziness is a form of optimization... 

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M2252941b1c7cca5b59b32c1f
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-19 Thread Rob Freeman
James,

My working definition of "truth" is a pattern that predicts. And I'm
tending away from compression for that.

Related to your sense of "meaning" in (Algorithmic Information)
randomness. But perhaps not quite the same thing.

I want to emphasise a sense in which "meaning" is an expansion of the
world, not a compression. By expansion I mean more than one,
contradictory, predictive pattern from a single set of data.

Note I'm saying a predictive pattern, not a predictable pattern.
(Perhaps as a random distribution of billiard balls might predict the
evolution of the table, without being predictable itself?)

There's randomness at the heart of that. Contradictory patterns
require randomness. A single, predictable, pattern, could not have
contradictory predictive patterns either? But I see the meaning coming
from the prediction, not any random pattern that may be making the
prediction.

Making meaning about prediction, and not any specific pattern itself,
opens the door to patterns which are meaningful even though new. Which
can be a sense for creativity.

Anyway, the "creative" aspect of it would explain why LLMs get so big,
and don't show any interpretable structure.

With a nod to the topic of this thread, it would also explain why
symbolic systems would never be adequate. It would undermine the idea
of stable symbols, anyway.

So, not consensus through a single, stable, Algorithmic Information
most compressed pattern, as I understand you are suggesting (the most
compressed pattern not knowable anyway?) Though dependent on
randomness, and consistent with your statement that "truth" should be
"relative to a given set of observations".

On Sat, May 18, 2024 at 11:57 PM James Bowery  wrote:
>
> Rob, the problem I have with things like "type theory" and "category theory" 
> is that they almost always elide their foundation in HOL (high order logic) 
> which means they don't really admit that they are syntactic sugars for 
> second-order predicate calculus.  The reason I describe this as "risible" is 
> the same reason I rather insist on the Algorithmic Information Criterion for 
> model selection in the natural sciences:
>
> Reduce the argument surface that has us all going into hysterics over "truth" 
> aka "the science" aka what IS the case as opposed to what OUGHT to be the 
> case.
>
> Note I said "reduce" rather than "eliminate" the argument surface.  All I'm 
> trying to do is get people to recognize that relative to a given set of 
> observations the Algorithmic Information Criterion is the best operational 
> definition of the truth.
>
> It's really hard for people to take even this baby step toward standing down 
> from killing each other in a rhyme with The Thirty Years War, given that 
> social policy is so centralized that everyone must become a de facto 
> theocratic supremacist as a matter of self defence.  It's really obvious that 
> the trend is toward capturing us in a control system, e.g. a Valley-Girl 
> flirtation friendly interface to Silicon Chutulu that can only be fought at 
> the physical level such as sniper bullets through the cooling systems of data 
> centers.  This would probably take down civilization itself given the 
> over-emphasis on efficiency vs resilience in civilization's dependence on 
> information systems infrastructure.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M8a84fef3037323602ea7dcca
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-18 Thread Quan Tesla
It's not about who wins the battle of models, rather if the models employed
would theoretically (symbolically) be a true representation of an AGI with
potential for ASI.

I think that LLMs on their own simply won't hack it. You may be satisfied
with the tradeoffs in commercialized value, but there are researchers who
are capable of predicting the quantum limits of current "solutions".

The profiteers profit while old-school scientists slog away, ever waiting
to pounce on their insights. Human behaviour, as self interested as it is,
suffers the Icarus complex.

In your argument, most likely you"ll have to aggregate all those costs,
placing most-correct AGI beyond the reach of the lifetimes of all the
players of the day.

Where does it leave this symbol of human ambition? In a negative value, in
an infinite loop.

Simply, because we have proven anf persist in a model which proves that we
have no respect - or perhaps insufficient understanding - of the cosmic
perfection in the conservation of energy.

No one model will do, not unless in its holism it would generate a net,
positive value. The number of that value would always be equivalent to 1.

Scientists are beginning to understand the simplicity of this thought. It's
about pattern languages, not brute force.

Is an LLM a pattern language? If so, is it sufficient to express all
aspects of a "language" for describing and specifying and managing AGI
evolution in, holistically?

If not, what is lacking and how can it be realized?

I think nature is pragmatic. It adds and it subtracts. If AGI is a symbol
of a natural system, then do the sum.

On Sun, May 19, 2024, 02:54 Matt Mahoney  wrote:

> On Thu, May 16, 2024, 11:27 AM  wrote:
>
>> What should symbolic approach include to entirely replace neural networks
>> approach in creating true AI? Is that task even possible? What benefits and
>> drawbacks we could expect or hope for if it is possible? If it is not
>> possible, what would be the reasons?
>>
>
> Surely you are aware of the 100% failure rate of symbolic AI over the last
> 70 years? It should work in theory, but we have a long history of
> underestimating the cost, lured by the early false success of covering half
> of the cases with just a few hundred rules.
>
> A human level language model is 10^9 bits, equivalent to 60M lines of code
> according to my compression tests, which yield 16 bits per line. A line of
> code costs $100, so your development cost is $6 billion, far beyond the
> budgets of the most ambitious attemps like Cyc or OpenCog.
>
> Or you can train a LLM with 100 to 1000 times as much knowledge for a few
> million at $2 per GPU hour.
>
>
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M892519d3918783ea7007180d
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-18 Thread Matt Mahoney
On Thu, May 16, 2024, 11:27 AM  wrote:

> What should symbolic approach include to entirely replace neural networks
> approach in creating true AI? Is that task even possible? What benefits and
> drawbacks we could expect or hope for if it is possible? If it is not
> possible, what would be the reasons?
>

Surely you are aware of the 100% failure rate of symbolic AI over the last
70 years? It should work in theory, but we have a long history of
underestimating the cost, lured by the early false success of covering half
of the cases with just a few hundred rules.

A human level language model is 10^9 bits, equivalent to 60M lines of code
according to my compression tests, which yield 16 bits per line. A line of
code costs $100, so your development cost is $6 billion, far beyond the
budgets of the most ambitious attemps like Cyc or OpenCog.

Or you can train a LLM with 100 to 1000 times as much knowledge for a few
million at $2 per GPU hour.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M5d7336a46b79663a410d119c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-18 Thread James Bowery
Rob, the problem I have with things like "type theory" and "category
theory" is that they almost always elide their foundation in HOL (high
order logic) which means they don't *really* admit that they are syntactic
sugars for second-order predicate calculus.  The reason I describe this as
"risible" is the same reason I rather insist on the Algorithmic Information
Criterion for model selection in the natural sciences:

Reduce the argument surface that has us all going into hysterics over
"truth" aka "the science" aka what IS the case as opposed to what OUGHT to
be the case.

Note I said "reduce" rather than "eliminate" the argument surface.  All I'm
trying to do is get people to recognize that *relative to a given set of
observations* the Algorithmic Information Criterion is the best operational
definition of the truth.

It's really hard for people to take even this *baby* step toward standing
down from killing each other in a rhyme with The Thirty Years War, given
that social policy is so centralized that everyone must become a de facto
theocratic supremacist as a matter of self defence.  It's really obvious
that the trend is toward capturing us in a control system, e.g. a
Valley-Girl flirtation friendly interface to Silicon Chutulu that can only
be fought at the physical level such as sniper bullets through the cooling
systems of data centers.  This would probably take down civilization itself
given the over-emphasis on efficiency vs resilience in civilization's
dependence on information systems infrastructure.

On Thu, May 16, 2024 at 10:36 PM Rob Freeman 
wrote:

> James,
>
> For relevance to type theories in programming I like Bartosz
> Milewski's take on it here. An entire lecture series, but the part
> that resonates with me is in the introductory lecture:
>
> "maybe composability is not a property of nature"
>
> Cued up here:
>
> Category Theory 1.1: Motivation and Philosophy
> Bartosz Milewski
> https://youtu.be/I8LbkfSSR58?si=nAPc1f0unpj8i2JT=2734
>
> Also Rich Hickey, the creator of Clojure language, had some nice
> interpretations in some of his lectures, where he argued for the
> advantages of functional languages over object oriented languages.
> Basically because, in my interpretation, the "objects" can only ever
> be partially "true".
>
> Maybe summarized well here:
>
> https://twobithistory.org/2019/01/31/simula.html
>
> Or here:
>
>
> https://www.flyingmachinestudios.com/programming/the-unofficial-guide-to-rich-hickeys-brain/
>
> Anyway, the code guys are starting to notice it too.
>
> -Rob
>
> On Fri, May 17, 2024 at 7:25 AM James Bowery  wrote:
> >
> > First, fix quantum logic:
> >
> >
> https://web.archive.org/web/20061030044246/http://www.boundaryinstitute.org/articles/Dynamical_Markov.pdf
> >
> > Then realize that empirically true cases can occur not only in
> multiplicity (OR), but with structure that includes the simultaneous (AND)
> measurement dimensions of those cases.
> >
> > But don't tell anyone because it might obviate the risible tradition of
> so-called "type theories" in both mathematics and programming languages
> (including SQL and all those "fuzzy logic" kludges) and people would get
> really pissy at you.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M2f546f083c9091e4e39fabc8
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-17 Thread Nanograte Knowledge Technologies
Mostly agreed, but it depends on your definition of NN. NN is equivalent to 
mutation (supposed to be). If we applied it in that sense, then NN could 
support other schemas of mutation, not diminish in functional value. 
Ultimately, I think we're heading towards a biochemical model for AGI, even if 
it is a synthetic one.

Synthetic means, not naturally made. It doesn't mean that a synthetic machine 
cannot function as a fully-recursive machine, which demonstrates of its own 
intuit an ability to perform conscious decisions of the highest order.

The concern with AGI has often been in the region of autonomous decision 
making. Who would predict exactly which moral, or 
strategic-tactical-operational, or "necessary" decision a powerful, autonomous 
machine could come to.

Which tribe would it conclude it belonged to and where would it position its 
sense of fealty? Would it be as fickle as humans on belonging and issues of 
loyalty to greater society? Altruism, would it get it? Would it develop a good 
and bad inclination, and structure society to favor either one of those 
"instincts" it may deem most logically indicated?


Mostly, would it be inclined towards "criminal" behavior, or even "terrorism" 
by any name? And if it decided to turn to rage in a relationship, would it feel 
justified in overpowering a weaker sex?

In that sense, success! We would have duplicated the complications of humanity!

From: John Rose 
Sent: Friday, 17 May 2024 13:48
To: AGI 
Subject: Re: [agi] Can symbolic approach entirely replace NN approach?

On Thursday, May 16, 2024, at 11:26 AM, ivan.moony wrote:
What should symbolic approach include to entirely replace neural networks 
approach in creating true AI?

Symbology will compress NN monstrosities… right?  Or should say increasing 
efficiency via emerging symbolic activity for complexity reduction. Then less 
NN will be required since the “intelligence” was will have been formed. But 
still need sensory…

There is much room for innovation in mathematics… some of us have been working 
on that for a while.
Artificial General Intelligence List<https://agi.topicbox.com/latest> / AGI / 
see discussions<https://agi.topicbox.com/groups/agi> + 
participants<https://agi.topicbox.com/groups/agi/members> + delivery 
options<https://agi.topicbox.com/groups/agi/subscription> 
Permalink<https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M5b45da5fff085a720d8ea765>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M844f85d23b2020dafbaecc77
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-17 Thread John Rose
On Thursday, May 16, 2024, at 11:26 AM, ivan.moony wrote:
> What should symbolic approach include to entirely replace neural networks 
> approach in creating true AI?

Symbology will compress NN monstrosities… right?  Or should say increasing 
efficiency via emerging symbolic activity for complexity reduction. Then less 
NN will be required since the “intelligence” was will have been formed. But 
still need sensory…

There is much room for innovation in mathematics… some of us have been working 
on that for a while.
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M5b45da5fff085a720d8ea765
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread Rob Freeman
James,

For relevance to type theories in programming I like Bartosz
Milewski's take on it here. An entire lecture series, but the part
that resonates with me is in the introductory lecture:

"maybe composability is not a property of nature"

Cued up here:

Category Theory 1.1: Motivation and Philosophy
Bartosz Milewski
https://youtu.be/I8LbkfSSR58?si=nAPc1f0unpj8i2JT=2734

Also Rich Hickey, the creator of Clojure language, had some nice
interpretations in some of his lectures, where he argued for the
advantages of functional languages over object oriented languages.
Basically because, in my interpretation, the "objects" can only ever
be partially "true".

Maybe summarized well here:

https://twobithistory.org/2019/01/31/simula.html

Or here:

https://www.flyingmachinestudios.com/programming/the-unofficial-guide-to-rich-hickeys-brain/

Anyway, the code guys are starting to notice it too.

-Rob

On Fri, May 17, 2024 at 7:25 AM James Bowery  wrote:
>
> First, fix quantum logic:
>
> https://web.archive.org/web/20061030044246/http://www.boundaryinstitute.org/articles/Dynamical_Markov.pdf
>
> Then realize that empirically true cases can occur not only in multiplicity 
> (OR), but with structure that includes the simultaneous (AND) measurement 
> dimensions of those cases.
>
> But don't tell anyone because it might obviate the risible tradition of 
> so-called "type theories" in both mathematics and programming languages 
> (including SQL and all those "fuzzy logic" kludges) and people would get 
> really pissy at you.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mea3f554271a532a282d58fa0
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread Mike Archbold
Historically the AGI community has not really embraced neural networks --
and the cost has been that the AI explosion has come from the mainstream
more or less.

On Thu, May 16, 2024 at 7:01 PM Quan Tesla  wrote:

> Without neural networks, a symbolic approach wouldn't be effective. My
> view is that, depending on the definition of what "symbolic approach" means
> in the context of AGI, in the least both such operational schemas would be
> required to achieve the level of systems abstraction that would satisfy a
> scientifically-sound (transferable) form of human intelligence. By
> implication, they would have to be seamlessly integrated. Anyone here
> working on such an integration?
>
> On Thu, May 16, 2024 at 7:27 PM  wrote:
>
>> What should symbolic approach include to entirely replace neural networks
>> approach in creating true AI? Is that task even possible? What benefits and
>> drawbacks we could expect or hope for if it is possible? If it is not
>> possible, what would be the reasons?
>>
>> Thank you all for your time.
>>
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M91559e2546f956afaa896d8e
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread Quan Tesla
Without neural networks, a symbolic approach wouldn't be effective. My view
is that, depending on the definition of what "symbolic approach" means in
the context of AGI, in the least both such operational schemas would be
required to achieve the level of systems abstraction that would satisfy a
scientifically-sound (transferable) form of human intelligence. By
implication, they would have to be seamlessly integrated. Anyone here
working on such an integration?

On Thu, May 16, 2024 at 7:27 PM  wrote:

> What should symbolic approach include to entirely replace neural networks
> approach in creating true AI? Is that task even possible? What benefits and
> drawbacks we could expect or hope for if it is possible? If it is not
> possible, what would be the reasons?
>
> Thank you all for your time.
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M2b52d7820d191a6fa0078f55
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread James Bowery
First, fix quantum logic:

https://web.archive.org/web/20061030044246/http://www.boundaryinstitute.org/articles/Dynamical_Markov.pdf

Then realize that empirically true cases can occur not only in multiplicity
(OR), but with structure that includes the simultaneous (AND) measurement
dimensions of those cases.

But don't tell anyone because it might obviate the risible tradition of
so-called "type theories" in both mathematics and programming languages
(including SQL and all those "fuzzy logic" kludges) and people would get
*really* pissy at you.


On Thu, May 16, 2024 at 10:27 AM  wrote:

> What should symbolic approach include to entirely replace neural networks
> approach in creating true AI? Is that task even possible? What benefits and
> drawbacks we could expect or hope for if it is possible? If it is not
> possible, what would be the reasons?
>
> Thank you all for your time.
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M4e5f58df19d779da625ab70e
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread Mike Archbold
It seems like most approaches including symbolic only could eventually lead
to "true AI" if you mean ~ passing the Turing test, but it might take 100
years. There is a race to the finish aspect to AI though.

On Thu, May 16, 2024 at 8:27 AM  wrote:

> What should symbolic approach include to entirely replace neural networks
> approach in creating true AI? Is that task even possible? What benefits and
> drawbacks we could expect or hope for if it is possible? If it is not
> possible, what would be the reasons?
>
> Thank you all for your time.
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mf21b8aa56f755a2e5104c181
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread Basile Starynkevitch


On 5/16/24 17:26, ivan.mo...@gmail.com wrote:
What should symbolic approach include to entirely replace neural 
networks approach in creating true AI? Is that task even possible? 
What benefits and drawbacks we could expect or hope for if it is 
possible? If it is not possible, what would be the reasons?



Expert system rules.

Generation of code (in C++ or machine code) thru declarative 
rules,including metarules generating rules and code.


Runtime Reflection by inspection of call stacks (e.g. using libbacktrace).

The RefPerSys  project (see http://refpersys.org/ 
and open source code on https://github.com/RefPerSys/RefPerSys ...) is 
developed with these ideas.


Email me for details.


Regards from near Paris in FrancePermalink  


--
Basile Starynkevitch
(only mine opinions / les opinions sont miennes uniquement)
8 rue de la Faïencerie, 92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/
See/voir:https://github.com/RefPerSys/RefPerSys

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M911b1bf07aaf1f24f0aaefc1
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread Jim Rutt
Seems unlikely as the first approach.  ANNs help us bridge things we have
little understanding of via brute force and lots of data.

Perhaps AFTER we get to ASI the ASI can figure out how to recode itself
symbolically, at huge gain (likely) in performance.

On Thu, May 16, 2024 at 11:27 AM  wrote:

> What should symbolic approach include to entirely replace neural networks
> approach in creating true AI? Is that task even possible? What benefits and
> drawbacks we could expect or hope for if it is possible? If it is not
> possible, what would be the reasons?
>
> Thank you all for your time.
> *Artificial General Intelligence List *
> / AGI / see discussions  +
> participants  +
> delivery options 
> Permalink
> 
>


-- 
Jim Rutt
My podcast: https://www.jimruttshow.com/

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Ma7dbfbf1e4ae324a5c8a9ed1
Delivery options: https://agi.topicbox.com/groups/agi/subscription


[agi] Can symbolic approach entirely replace NN approach?

2024-05-16 Thread ivan . moony
What should symbolic approach include to entirely replace neural networks 
approach in creating true AI? Is that task even possible? What benefits and 
drawbacks we could expect or hope for if it is possible? If it is not possible, 
what would be the reasons?

Thank you all for your time.
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mf9d9e99b7d5517ff12239b07
Delivery options: https://agi.topicbox.com/groups/agi/subscription