Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread Rob Freeman
"Importantly, the new entity ¢X is not a category based on the
features of the members of the category, let alone the similarity of
such features"

Oh, nice. I hadn't seen anyone else making that point. This paper 2023?

That's what I was saying. Nice. A vindication. Such categories
decouple the pattern itself from the category.

But I'm astonished they don't cite Coecke, as the obvious quantum
formulation precedent (though I noticed it for language in the '90s.)

I wonder how their formulation relates to what Symbolica are doing
with their category theoretic formulations:

https://youtu.be/rie-9AEhYdY?si=9RUB3O_8WeFSU3ni

I haven't read closely enough to know if they make that decoupling of
category from pattern a sense for "creativity" the way I'm suggesting.
Perhaps that's because a Hamiltonian formulation is still too trapped
in symbolism. We need to remain trapped in the symbolism for physics.
Because for physics we don't have access to an underlying reality.
That's where AI, and particularly language, has an advantage. Because,
especially for language, the underlying reality of text is the only
reality we do have access to (though Chomsky tried to swap that
around, and insist we only access our cognitive insight.)

For AI, and especially for language, we have the opportunity to get
under even a quantum formalism. It will be there implicitly, but
instead of laboriously formulating it, and then collapsing it at run
time, we can simply "collapse" structure directly from observation.
But that "collapse" must be flexible, and allow different structures
to arise from different symmetries found in the data from moment to
moment. So it requires the abandonment of back-prop.

In theory it is easy though. Everything can remain much as it is for
LLMs. Only, instead of trying to "learn" stable patterns using
back-prop, we must "collapse" different symmetries in the data in
response to a different "prompt", at run time.

On Tue, May 21, 2024 at 5:01 AM James Bowery  wrote:
>
> From A logical re-conception of neural networks: Hamiltonian bitwise 
> part-whole architecture
>
>> From hierarchical statistics to abduced symbols
>> It is perhaps useful to envision some of the ongoing devel-
>> opments that are arising from enlarging and elaborating the
>> Hamiltonian logic net architecture. As yet, no large-scale
>> training whatsoever has gone into the present minimal HNet
>> model; thus far it is solely implemented at a small, introduc-
>> tory scale, as an experimental new approach to representa-
>> tions. It is conjectured that with large-scale training, hierar-
>> chical constructs would be accreted as in large deep network
>> systems, with the key difference that, in HNets, such con-
>> structs would have relational properties beyond the “isa”
>> (category) relation, as discussed earlier.
>> Such relational representations lend themselves to abduc-
>> tive steps (McDermott 1987) (or “retroductive” (Pierce
>> 1883)); i.e., inferential generalization steps that go beyond
>> warranted statistical information. If John kissed Mary, Bill
>> kissed Mary, and Hal kissed Mary, etc., then a novel cate-
>> gory ¢X can be abduced such that ¢X kissed Mary.
>> Importantly, the new entity ¢X is not a category based on
>> the features of the members of the category, let alone the
>> similarity of such features. I.e., it is not a statistical cluster
>> in any usual sense. Rather, it is a “position-based category,”
>> signifying entities that stand in a fixed relation with other
>> entities. John, Bill, Hal may not resemble each other in any
>> way, other than being entities that all kissed Mary. Position-
>> based categories (PBCs) thus fundamentally differ from
>> “isa” categories, which can be similarity-based (in unsuper-
>> vised systems) or outcome-based (in supervised systems).
>> PBCs share some characteristics with “embeddings” in
>> transformer architectures.
>> Abducing a category of this kind often entails overgener-
>> alization, and subsequent learning may require learned ex-
>> ceptions to the overgeneralization. (Verb past tenses typi-
>> cally are formed by appending “-ed”, and a language learner
>> may initially overgeneralize to “runned” and “gived,” neces-
>> sitating subsequent exception learning of “ran” and “gave”.)
>
>
> The abduced "category" ¢X bears some resemblance to the way Currying (as in 
> combinator calculus) binds a parameter of a symbol to define a new symbol.  
> In practice it only makes sense to bother creating this new symbol if it, in 
> concert with all other symbols, compresses the data in evidence.  (As for 
> "overgeneralization", that applies to any error in prediction encountered 
> during learning and, in the ideal compressor, increases the algorithm's 
> length even if only by appending the exceptional data in a conditional -- NOT 
> "falsifying" anything as would that rascal Popper).
>
> This is "related" to quantum-logic in the sense that Tom Etter calls out in 
> the linked 

Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread keghnfeem
Tokens inside transformers are supervised internal symbols.
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M102516027fd65ca8c1f90b8b
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread James Bowery
From
*A logical re-conception of neural networks: Hamiltonian bitwise part-whole
architecture*


> *From hierarchical statistics to abduced symbols*It is perhaps useful to
> envision some of the ongoing devel-
> opments that are arising from enlarging and elaborating the
> Hamiltonian logic net architecture. As yet, no large-scale
> training whatsoever has gone into the present minimal HNet
> model; thus far it is solely implemented at a small, introduc-
> tory scale, as an experimental new approach to representa-
> tions. It is conjectured that with large-scale training, hierar-
> chical constructs would be accreted as in large deep network
> systems, with
> *the key difference that, in HNets, such con-structs would have relational
> properties* beyond the “isa”
> (category) relation, as discussed earlier.
> Such relational representations lend themselves to abduc-
> tive steps (McDermott 1987) (or “retroductive” (Pierce
> 1883)); i.e., inferential generalization steps that go beyond
> warranted statistical information. If John kissed Mary, Bill
> kissed Mary, and Hal kissed Mary, etc., then a novel cate-
> gory ¢X can be abduced such that ¢X kissed Mary.
> Importantly, the new entity ¢X is not a category based on
> the features of the members of the category, let alone the
> similarity of such features. I.e., it is not a statistical cluster
> in any usual sense. Rather, it is a “position-based category,”
> signifying entities that stand in a fixed relation with other
> entities. John, Bill, Hal may not resemble each other in any
> way, other than being entities that all kissed Mary. Position-
> based categories (PBCs) thus fundamentally differ from
> “isa” categories, which can be similarity-based (in unsuper-
> vised systems) or outcome-based (in supervised systems).
> PBCs share some characteristics with “embeddings” in
> transformer architectures.
> Abducing a category of this kind often entails overgener-
> alization, and subsequent learning may require learned ex-
> ceptions to the overgeneralization. (Verb past tenses typi-
> cally are formed by appending “-ed”, and a language learner
> may initially overgeneralize to “runned” and “gived,” neces-
> sitating subsequent exception learning of “ran” and “gave”.)


The abduced "category" ¢X bears some resemblance to the way Currying
(as in combinator
calculus ) binds a
parameter of a symbol to define a new symbol.  In practice it only makes
sense to bother creating this new symbol if it, in concert with all other
symbols, compresses the data in evidence.  (As for "overgeneralization",
that applies to any error in prediction encountered during learning and, in
the ideal compressor, increases the algorithm's length even if only by
appending the exceptional data in a conditional -- *NOT* "falsifying"
anything as would that rascal Popper).

This is "related" to quantum-logic in the sense that Tom Etter calls out in
the linked presentation:

Digram box linking, which is based on the *mathematics of relations
> rather than of functions*, is a more general operation than the
> composition of transition matrices.


On Thu, May 16, 2024 at 7:24 PM James Bowery  wrote:

> First, fix quantum logic:
>
>
> https://web.archive.org/web/20061030044246/http://www.boundaryinstitute.org/articles/Dynamical_Markov.pdf
>
> Then realize that empirically true cases can occur not only in
> multiplicity (OR), but with structure that includes the simultaneous (AND)
> measurement dimensions of those cases.
>
> But don't tell anyone because it might obviate the risible tradition of
> so-called "type theories" in both mathematics and programming languages
> (including SQL and all those "fuzzy logic" kludges) and people would get
> *really* pissy at you.
>
>
> On Thu, May 16, 2024 at 10:27 AM  wrote:
>
>> What should symbolic approach include to entirely replace neural networks
>> approach in creating true AI? Is that task even possible? What benefits and
>> drawbacks we could expect or hope for if it is possible? If it is not
>> possible, what would be the reasons?
>>
>> Thank you all for your time.
>> *Artificial General Intelligence List *
>> / AGI / see discussions  +
>> participants  +
>> delivery options 
>> Permalink
>> 
>>

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Ma9215f03be1998269e14f977
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread James Bowery
On Mon, May 20, 2024 at 9:49 AM Rob Freeman 
wrote:

> Well, I don't know number theory well, but what axiomatization of
> maths are you basing the predictions in your series on?
>
> I have a hunch the distinction I am making is similar to a distinction
> about the choice of axiomatization. Which will be random. (The
> randomness demonstrated by Goedel's diagonalization lemma? "True" but
> not provable/predictable within the system?)
>

Here's how I tend to think about it:

Solomonoff addressed this "random" choice of axioms by introducing a random
bit string (the axioms of the theory) interpreted as an algorithm (rules of
inference) which, itself, produces another bit string (theorems).

However, this leaves undefined the "rules of inference" which, in my way of
thinking, is like leaving undefined the choice of UTM within Algorithmic
Information Theory.

I've addressed this before in terms of the axioms of arithmetic by saying
that the choice of UTM is no more "random" than is the choice of axioms of
arithmetic which must, itself, incorporate the rules of inference else you
have no theory.

Marcus Hutter has addressed this "philosophical nuisance" in terms of no
post hoc (after observing the dataset) choice of UTM being permitted by the
principles of prediction.

I've further addressed this philosophical nuisance by permitting the
sophist to examine the dataset prior to "choosing the UTM", but restricted
to NiNOR Complexity
 which
further reduces the argument surface available to sophists.


> On Mon, May 20, 2024 at 9:09 PM James Bowery  wrote:
> >
> >
> >
> > On Sun, May 19, 2024 at 11:32 PM Rob Freeman 
> wrote:
> >>
> >> James,
> >>
> >> My working definition of "truth" is a pattern that predicts. And I'm
> >> tending away from compression for that.
> >
> >
> > 2, 4, 6, 8
> >
> > does it mean
> > 2n?
> >
> > or does it mean
> > 10?
> >
> >
> >
> >> Related to your sense of "meaning" in (Algorithmic Information)
> >> randomness. But perhaps not quite the same thing.
> >
> >
> > or does it mean a probability distribution of formulae that all produce
> 2, 4, 6, 8 whatever they may subsequently produce?
> >
> > or does it mean a probability distribution of sequences
> > 10, 12?
> > 10, 12, 14?
> > 10, 13, 14?
> > ...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M1ce471d20cc6a3bfdec9f397
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread Rob Freeman
Well, I don't know number theory well, but what axiomatization of
maths are you basing the predictions in your series on?

I have a hunch the distinction I am making is similar to a distinction
about the choice of axiomatization. Which will be random. (The
randomness demonstrated by Goedel's diagonalization lemma? "True" but
not provable/predictable within the system?)

On Mon, May 20, 2024 at 9:09 PM James Bowery  wrote:
>
>
>
> On Sun, May 19, 2024 at 11:32 PM Rob Freeman  
> wrote:
>>
>> James,
>>
>> My working definition of "truth" is a pattern that predicts. And I'm
>> tending away from compression for that.
>
>
> 2, 4, 6, 8
>
> does it mean
> 2n?
>
> or does it mean
> 10?
>
>
>
>> Related to your sense of "meaning" in (Algorithmic Information)
>> randomness. But perhaps not quite the same thing.
>
>
> or does it mean a probability distribution of formulae that all produce 2, 4, 
> 6, 8 whatever they may subsequently produce?
>
> or does it mean a probability distribution of sequences
> 10, 12?
> 10, 12, 14?
> 10, 13, 14?
> ...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M086013ed4b196bdfe9a874c8
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread James Bowery
On Sun, May 19, 2024 at 11:32 PM Rob Freeman 
wrote:

> James,
>
> My working definition of "truth" is a pattern that predicts. And I'm
> tending away from compression for that.
>

2, 4, 6, 8

does it mean
2n?

or does it mean
10?



Related to your sense of "meaning" in (Algorithmic Information)
> randomness. But perhaps not quite the same thing.
>

or does it mean a probability distribution of formulae that all produce 2,
4, 6, 8 whatever they may subsequently produce?

or does it mean a probability distribution of sequences
10, 12?
10, 12, 14?
10, 13, 14?
...



> I want to emphasise a sense in which "meaning" is an expansion of the
> world, not a compression. By expansion I mean more than one,
> contradictory, predictive pattern from a single set of data.
>

I hope you can see from the above questions that we are talking about
probability distributions.  What is the difference between the probability
distribution of algorithms (aka formulae) and the probability distribution
of the strings they generate?


> Note I'm saying a predictive pattern, not a predictable pattern.
> (Perhaps as a random distribution of billiard balls might predict the
> evolution of the table, without being predictable itself?)
>
> There's randomness at the heart of that. Contradictory patterns
> require randomness. A single, predictable, pattern, could not have
> contradictory predictive patterns either? But I see the meaning coming
> from the prediction, not any random pattern that may be making the
> prediction.
>
> Making meaning about prediction, and not any specific pattern itself,
> opens the door to patterns which are meaningful even though new. Which
> can be a sense for creativity.
>
> Anyway, the "creative" aspect of it would explain why LLMs get so big,
> and don't show any interpretable structure.
>
> With a nod to the topic of this thread, it would also explain why
> symbolic systems would never be adequate. It would undermine the idea
> of stable symbols, anyway.
>
> So, not consensus through a single, stable, Algorithmic Information
> most compressed pattern, as I understand you are suggesting (the most
> compressed pattern not knowable anyway?) Though dependent on
> randomness, and consistent with your statement that "truth" should be
> "relative to a given set of observations".
>
> On Sat, May 18, 2024 at 11:57 PM James Bowery  wrote:
> >
> > Rob, the problem I have with things like "type theory" and "category
> theory" is that they almost always elide their foundation in HOL (high
> order logic) which means they don't really admit that they are syntactic
> sugars for second-order predicate calculus.  The reason I describe this as
> "risible" is the same reason I rather insist on the Algorithmic Information
> Criterion for model selection in the natural sciences:
> >
> > Reduce the argument surface that has us all going into hysterics over
> "truth" aka "the science" aka what IS the case as opposed to what OUGHT to
> be the case.
> >
> > Note I said "reduce" rather than "eliminate" the argument surface.  All
> I'm trying to do is get people to recognize that relative to a given set of
> observations the Algorithmic Information Criterion is the best operational
> definition of the truth.
> >
> > It's really hard for people to take even this baby step toward standing
> down from killing each other in a rhyme with The Thirty Years War, given
> that social policy is so centralized that everyone must become a de facto
> theocratic supremacist as a matter of self defence.  It's really obvious
> that the trend is toward capturing us in a control system, e.g. a
> Valley-Girl flirtation friendly interface to Silicon Chutulu that can only
> be fought at the physical level such as sniper bullets through the cooling
> systems of data centers.  This would probably take down civilization itself
> given the over-emphasis on efficiency vs resilience in civilization's
> dependence on information systems infrastructure.

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Me2c000d7572de5b0a5769775
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Can symbolic approach entirely replace NN approach?

2024-05-20 Thread John Rose
On Saturday, May 18, 2024, at 6:53 PM, Matt Mahoney wrote:
> Surely you are aware of the 100% failure rate of symbolic AI over the last 70 
> years? It should work in theory, but we have a long history of 
> underestimating the cost, lured by the early false success of covering half 
> of the cases with just a few hundred rules.
> 

I view LLM’s as systems within symbolic systems. Why? Simply that we exist in a 
spacetime environment and ALL COMMUNICATION is symbolic. And sub-symbolic 
representation is required for computation. All bits are symbols based on 
probabilities. Then as LLM’s become more intelligent the physical power 
consumption required to produce similar results will decrease as their symbolic 
networks grow and optimize.

Could be wrong but It makes sense to me… saying everything is symbolic 
eliminates the argument. I know it's lazy but  that's often how developers look 
at things in order to code them up :) Laziness is a form of optimization... 

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M2252941b1c7cca5b59b32c1f
Delivery options: https://agi.topicbox.com/groups/agi/subscription