On Mon, May 27, 2024 at 9:34 AM Rob Freeman <chaotic.langu...@gmail.com>
wrote:

> James,
>
> I think you're saying:
>
> 1) Grammatical abstractions may not be real, but they can still be
> useful abstractions to parameterize "learning".
>

And more generally, people pragmatically adopt different fictional abstract
grammars as convenient:  Fictions except perhaps the lowest power ones like
regular/finite-state automata which discrete and finite dynamical systems,
*real*izable in physical phenomena.



> 2) Even if after that there are "rules of thumb" which actually govern
> everything.
>

Well, there are abstract grammars (see #1) and there are instances of those
grammars (#2), which may be induced given the
fictional-abstract-grammar-of-convenience.  BUT we would prefer not to have
to _learn_ these maximum-parsimony induced grammars if it can be avoided.
Some fragments of physics may be learnable from the raw data, but we may
not wish to do so depending on the task at hand -- hence "physics informed
machine learning" makes sense in many cases.

Well, you might say why not just learn the "rules of thumb".

Hopefully you meant "one" rather than "you" since in both #1 and #2 I've
repeatedly asserted that these are rules of thumb/heuristics/ML biases
(whether biasing toward a particular abstract grammar (#1) or toward a
particular grammar (#2)) we should be taking more seriously.


> But the best counter against the usefulness of the Chomsky hierarchy
> for parameterizing machine learning, might be that Chomsky himself
> dismissed the idea it might be learned.


A machine learning algorithm is like the phenotype of the genetic code
Chomsky has hypothesized for innate capacity to learn grammar.  It
"parameterizes" the ML algorithm prior to any learning taking place by that
ML algorithm.  Both #1 and #2 are examples of such "genetic code" hardwired
into the ML algorithm -- to "bias" learning in such a manner as to speed up
convergence on lossless compression of the training data (ie: all prior
observations in the ideal case of the Algorithmic Information Criterion for
model selection since "test" and "validation" data are available in
subsequent observations and you can't do any better than the best lossless
compression as the basis for decision).


> And his most damaging
> argument? That learned categories contradict. "Objects" behave
> differently in one context, from how they behave in another context.


I guess the best thing to do at this stage is stop talking about "Chomsky"
and even about "grammar" and instead talk only about which level of "
automata <https://en.wikipedia.org/wiki/Automata_theory>" one wishes to
focus on in the "domain specific computer programming language" one wishes
to use to achieve the shortest algorithm that outputs all prior
observations.  Some domain specific languages are not Turing complete but
may nevertheless be interpreted by a meta-interpreter that is.

This gets away from anything specific to "Chomsky" and lets us focus on the
more well-grounded notion of Algorithmic Information.


> I see it a bit like our friend the Road Runner. You can figure out a
> physics for him. But sometimes that just goes haywire and contradicts
> itself - bodies make holes in rocks, fly high in the sky, or stretch
> wide.
>
> All the juice is in these weird "rules of thumb".
>
> Chomsky too failed to find consistent objects. He was supposed to push
> past the highly successful learning of phoneme "objects", and find
> "objects" for syntax. And he failed. And the most important reason
> I've found, was that even for phonemes, learned category contradicted.
>
> That hierarchy stuff, that wasn't supposed to appear in the data. That
> could only be in our heads. Innate. Why? Well for one thing, because
> the data contradicted. The "learning procedures" of the time generated
> contradictory objects. This is a forgotten result. Machine learning is
> still ignoring this old result from the '50s. (Fair to say the
> DeepMind paper ignores it?) Chomsky insisted these contradictions
> meant the "objects" must be innate. The idea cognitive objects might
> be new all the time (and particularly the idea they might contradict!)
> is completely orthogonal to his hierarchy (well, it might be
> compatible with context sensitivity, if you accept that the real juice
> is in the mechanism to implement the context sensitivity?)
>
> If categories contradict, that is represented on the Chomsky hierarchy
> how? I don't know. How would you represent contradictory categories on
> the Chomsky hierarchy? A form of context sensitivity?
>
> Actually, I think, probably, using entangled objects like quantum. Or
> relation and variance based objects as in category theory.
>
> I believe Coecke's team has been working on "learning" exactly this:
>
> From Conceptual Spaces to Quantum Concepts: Formalising and Learning
> Structured Conceptual Models
> Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljiˇc and Stephen Clark
> Quantinuum
> https://browse.arxiv.org/pdf/2401.08585
>
> I'm not sure. I think the symbolica.ai people may be working on
> something similar: find some level of abstraction which applies even
> across varying objects (contradictions?)
>
> For myself, in contrast to Bob Coecke, and the category theory folks,
> I think it's pointless, and maybe unduly limiting, to learn this
> indeterminate object formalism from data, and then collapse it into
> one or other contradictory observable form, each time you observe it.
> (Or seek some way you can reason with it even in indeterminate object
> formulation, as with the category theory folks?) I think you might as
> well collapse observable objects directly from the data.
>
> I believe this collapse "rule of thumb", is the whole game, one shot,
> no real "learning" involved.
>
> All the Chomsky hierarchy limitations identified in the DeepMind paper
> would disappear too. They are all limitations of not identifying
> objects. Context coding hacks like LSTM, or "attention", introduced in
> lieu of actual objects, and grammars over those objects, stemming from
> the fact grammars of contradictory objects are not "learnable."
>
> On Sun, May 26, 2024 at 11:24 PM James Bowery <jabow...@gmail.com> wrote:
> >
> > It's also worth reiterating a point I made before about the confusion
> between abstract grammar as a prior (heuristic) for grammar induction and
> the incorporation of so-induced grammars as priors, such as in "physics
> informed machine learning".
> >
> > In the case of physics informed machine learning, the language of
> physics is incorporated into the learning algorithm.  This helps the
> machine learning algorithm learn things about the physical world without
> having to re-derive the body of physics knowledge.
> >
> > Don't confuse the two levels here:
> >
> > 1) My suspicion that natural language learning may benefit from
> prioritizing HOPDA as an abstract grammar to learn something about natural
> languages -- such as their grammars.
> >
> > 2) My suspicion (supported by "X informed machine learning" exemplified
> by the aforelinked work) that there may be prior knowledge about natural
> language more specific than the level of abstract grammar -- such as
> specific rules of thumb for, say, the English language that may greatly
> speed training time on English corpora.
> >
> > On Sun, May 26, 2024 at 9:40 AM James Bowery <jabow...@gmail.com> wrote:
> >>
> >> See the recent DeepMind paper "Neural Networks and the Chomsky
> Hierarchy" for the sense of "grammar" I'm using when talking about the HNet
> paper's connection to Granger's prior papers about "grammar", the most
> recent being "Toward the quantification of cognition".  Although the
> DeepMind paper doesn't refer to Granger's work on HOPDAs, it does at least
> illustrate a fact, long-recognized in the theory of computation:
> >>
> >> Grammar, Computation
> >> Regular, Finite-state automaton
> >> Context-free, Non-deterministic pushdown automaton
> >> Context sensitive, Linear-bounded non-deterministic Turing machine
> >> Recursively enumerable, Turing machine
> >>
> >> Moreover, the DeepMind paper's empirical results support the
> corresponding hierarchy of computational power.
> >>
> >> Having said that, it is critical to recognize that everything in a
> finite universe reduces to finite-state automata in hardware -- it is only
> in our descriptive languages that the hierarchy exists.  We don't describe
> all computer programs in terms of finite-state automata aka regular grammar
> languages.  We don't describe all computer programs even in terms of Turing
> complete automata aka recursively enumerable grammar languages.
> >>
> >> And I have stated before (which I first linked to the HNet paper)
> HOPDAs are interesting as a heuristic because they may point the way to a
> prioritization if not restriction on the program search space that
> evolution has found useful in creating world models during an individual
> organism's lifetime.
> >>
> >> The choice of language, hence the level of grammar, depends on its
> utility in terms of the Algorithmic Information Criterion for model
> selection.
> >>
> >> I suppose one could assert that none of that matters so long as there
> is any portion of the "instruction set" that requires the Turing complete
> fiction, but that's a rather ham-handed critique of my nuanced point.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-Mee88405f9bb0c21f8826aa17
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to