1. @Can symblic approach ... 2. @Rob Freeman LLMs, What's wrong with NLP (2009-2024), Whisper
*1*. *IMO the sharpness of the division "neat" and "scruffy", NN and symbolic is confused: Neural Networks are also symbolic:* http://artificial-mind.blogspot.com/2019/04/neural-networks-are-also-symbolic.html NNs are a subset of symbolic, as of implementation and the output, and symbolic is also a bad term, a better one is *conceptual*, in a developing system it is about creation of systems of *concepts* and operating with them; generalization from specifics; not about "symbols" (dereferencing or the characters) or mindless algebra, "symbol manipulation", as every calculation can be seen as something like that. Either NN or whatever computational is a program in a computer, the training of NN is a kind of programming, in the big-data trained one the data is just the biggest part of the code. If the "data part" of the code is represented in a more succinct way, or the current NN part is more complex so it needs less "brute force"*, they will converge in another intermediate representation. The brute force is also relative. The whole NNs are concepts or "symbols" within "symbolic" systems, it could incorporate it and use whatever is already available, existing models and training new ones for particular tasks. An AGI, Mind and Universe, cognitively, are hierarchical simulators of virtual universes, with the lowest level being "the real" or "physical" one for the current evaluator-observer virtual universe (causality-control unit). Whatever works is allowed. The terms are from the Theory of Universe and Mind, classical version 2001-2004, taught during the world-first university course in AGI (Plovdiv 2010,2011) and the core reasoning gradually got incorporated in the mainstream AI (some were hiding there earlier). That kind of architecture or working, providing an explicit imagination is something that LLMs currently lack, they "have" only implicit and "sweeping-on-the-go" one, encoded within their whole system, like the diffusion models and GANs have implicit models for 3D models and global illumination., *2.* Yes, the tokenization in current LLMs is usually "wrong", it's workable for shaping and generating plausible matter for the modality, given "knowing all already" and covering all cases, but it should be on concepts and world models: simulators of virtual universes, and mapped to imagination, it should predict the *physical* future of the virtual worlds, not these tokens which are often not morphemes, not cases - sometimes they match specific "meaningful" ones, and as they now use huge amount, many words and MWE get separate tokens. The models can indirectly create these *world *models in order to generate the correct words, but then the data should include wider information and intentions, as in the multimodal models. The following early 2009 articles are still valid, while there is progress according to some of the suggestions, and now there is a longer "chain of intelligent operations" (even "Chain of thought reasoning" as a term): *What's wrong with Natural Language Processing? Part I: * https://artificial-mind.blogspot.com/2009/02/whats-wrong-with-natural-language.html *What's wrong with Natural Language Processing? Part II : Static, Specific, High-level, Not-evolving... * https://artificial-mind.blogspot.com/2009/03/whats-wrong-with-natural-language.html It includes criticism about the NLP tests at the time, there were a few back then, POS-tagging etc., now they are plenty, but many seem as funny as back then, once I reviewed one for predicting the next word from novels, the examples from the paper were all stereotypes and banalities, and the LLMs celebrate going higher from 68.4 to 69.6%, just like in the 2000s. A part of the conclusions of one of the works: *""" * *(...) Yes, mainstream NLP at the moment:* *- Is useful.* *- Solve[s] some abstract specific problems by heuristics.* *- It works to some degree for "intelligent" tasks, because of course language do maps mind.* *However, the mainstream still does not lead to a chain of intelligent operations, there are not loops and cumulative development.* *-- The length of the chain of inter-related intelligent operations in NLP today is very short. This is related to the lack of will and general goals of the systems. These systems are "push-the-button-and-fetch-the-result". *[Now: "prompts", but moving to agents] *-- Swallowing of a huge corpus of 1 billion of words or so and a computation of statistical dependencies between tokens is not the way mind works.* !!! Mind learns step by step, modeling simpler constructs/situations/dynamics/models before reaching to more complex. !!! Temporal relations of the input with different complexity is important. !!! Mind usually uses many sensory inputs while learning. *Very important. [Multimodal models, Vision-Text models] * !!! Mind has will, uses feedback and can actively and evolutionary test and improve correctness and effectiveness of its operation, including natural-language-related. (...) *I suggest:* 1. *Hollistic approach* - the goal is building an operational mind with [a] long chain of intelligent operations, not completion of a table with values 94.55% 96.5% 90.4% and a long list with quotes in the end of a paper. *[LOL, this is still going on.]* (...) """" Yes, humans seem to need less data (and a proper design should need less), but re Whisper in particular - I think the comparison is not well aligned, because it "learns" all languages. Speech recognition for a single speaker and single language and limited conditions doesn't need so much data (and if it's about *understanding* the content, one voice would be enough), even with the "dumb" methods and humans are not perfect recognizers either. Average humans are poor in non-native language and won't recognize correctly many words with accents in their own native language, and won't have a high precision for rare or complex words and expressions, especially in languages with irregular writing and overlapping phonetics and many accents. The actual learning time for humans is also not only the "live" time, because the brain is replaying (which is also "data augmentation") and learning is more multimodal, one other modality is one's own speech producing system, both the commands for controlling it, and its output. >The only structure is "tokens". Yes, but the only at the input level, there are implicit structures, however too much "embedded" in the whole, and the low level one is not well "grounded" to the actual other-modalities input which it represents, in the typical LLMs. The human language, for the human agents, is multimodal, multirange, multiresolution, includes intentions (again at multiple scales, precisions, ranges; also other agent's inferred or recognized/suggested ones) etc. from the start. *Theory of Universe and Mind* https://github.com/Twenkid/Theory-of-Universe-and-Mind ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M7f05c62c71f943e0cce5da1f Delivery options: https://agi.topicbox.com/groups/agi/subscription