Ben and List, I wanted to leave this. I'm glad I didn't.
I hadn't previously been paying attention and missed this thread. It's actually very good. Thanks to Dorian Aur for dragging it back up. I agree with almost all of it. As with all I've been commenting on here. There is just one small gap I would like to close, one small loose thread I would like to pull. But I think it is an important one. Where I see the gap might be summarized with one statement by Linas: LV, Feb 9: "If you are picky, you can have several different kinds of common nouns, but however picky you are, you need fewer classes than there are words." "...however picky you are, you need fewer classes than there are words..." "Fewer classes"? How do we know that is true? The number of ways a set of objects can be formed into classes is far greater than the number of objects. I think an analysis of why this is so -- why we can meaningfully create more classes than we have examples, and why we need to -- will be, if not AGI, then the basis of the leap from narrow AI to AGI. On one level it may reduce to the assertion that gauge, as observed by Linas in his symbolism and connectionism paper, is not trivial for language: Note 10, p.g. 52, "Neural-Net vs. Symbolic Machine Learning": "In physics, such an ambiguity of factorization is known as a global gauge symmetry; fixing a gauge removes the ambiguity. In natural language, the ambiguity is spontaneously broken: words have only a few senses, and are often synonymous, making the left and right factors sparse. For this reason, the analogy to physics is entertaining but mostly pointless." It's not pointless Linas. Experimentally, that language "gauge" is not pointless, was observed in the results for the distributional learning of phonemes I cited, dating back 60 years, to the time Chomsky used it to destroy distributional learning as the central paradigm for linguistics. If Linas takes his formalism and adds just this one insight, I think we have the basis to move beyond the wall which we been left at by symbolic methods for 60 years. In concrete terms it means we don't need to "learn" 10K to 1M rules. We need to be able to "permute" the elements of (causally related) data many more ways that that. Sergio Pissanetzky comes close to this, with his analysis in terms of permutations, also resulting in a network: "Structural Emergence in Partially Ordered Sets is the Key to Intelligence" http://sergio.pissanetzky.com/Publications/AGI2011.pdf Though Pissanetzky may not see the significance of gauge either. We need a mechanism to cluster these networks on the fly. Nothing exists until it is observed. Going on. Fascinating insights into the state of OpenCog: LV: 'as of about 8-9 months ago, the language-learning project has been handed over to a team. They are still getting up to speed; they don't have any background in linguistics or machine learning, and so have been trying to learn both at the same time, and getting tangled up and stalled, as a result. Currently they are stumbling at the "clustering" step' Sure they will! Of course if you try and "learn" something which is subjective you'll run your processor hot chasing your tail, but just end up breaking the wrong symmetry for an appreciable fraction of your problems. That's been the history of distributional language learning since the '90s. I interviewed for a PhD on this with David Powers and Chris Manning in 1998. The only solution is root and branch change away from "learning" and towards resolving the "gauge" by finding contextually relevant low energy permutations or factorizations of sets at run time. Looking up what you are doing with your "MST" parser. You start by clustering links, starting with mutual information. One way to see the change which needs to take place might be to move away from thinking about the problem as the character of links, and focus on the problem as permutations of links. I think Pissanetzky's causal sets can be thought of as links. The thing is he focuses on permuting them. Think about it. Another way to find a "minimum spanning tree" is to set your network oscillating. And that can be done in real time (for parallel hardware). Look at this paper: A Network of Integrate and Fire Neurons for Community Detection in Complex Networks Marcos G. Quiles, Liang Zhao, Fabricio A. Breve, Roseli A. F. Romero http://www.sbmac.org.br/dincon/trabalhos/PDF/invited/69194.pdf You have all the pieces Linas. Put them together. 1) We don't know there are fewer meaningful classes of words than words, in theory the number of combinations, sets, of words is far greater than the number of words. 2) Even in theory, your own theory, by the properties of the formalism, language networks also have gauge. 3) In practice, as insisted on by Chomsky 60 years ago, they are observed to have gauge. 4) Only a context of use will break the symmetry and determine the correct factorization. We need to factorize, permute, the links at run-time, in context. 5) A good way to cluster links at run time, efficient on parallel hardware, is to set a network oscillating and observe which groupings of links synchronize. Or if you have no power... Ben, stop spending money trying to "learn" all the N!/(N-R)!R! possible combinations of R member sets made from N words. The system has a gauge. And many factorizations contradict, so can't be "learned" simultaneously anyway. You won't be able to break the symmetry and know which of the many contradictory sets you need until runtime. OK, you may be right that BERT models may have an advantage that Linas doesn't see, because deep nets do allow some recombination of parts, within a network layer, at run time. But the key to carry us past the wall, is forming sets at runtime. It will be easy to do that just by setting the network oscillating about the words in a given context. -Rob On Sun, Feb 10, 2019 at 8:58 AM Linas Vepstas <linasveps...@gmail.com> wrote: > > > On Sat, Feb 9, 2019 at 4:22 AM Ben Goertzel <b...@goertzel.org> wrote: > >> >> We are now playing with hybridizing these symbolic-ish grammar >> induction methods with neural net language models, basically using the >> predictive models produced by models in the BERT lineage (but more >> sophisticated than vanilla BERT) in place of simple mutual information >> values to produce more broadly-context-sensitive parse choices in >> Linas's MST parser... >> > > This last sentence suggests that the near-total confusion about MST > continues to persist in the team. I keep telling them to collect the > statistics, and then discard the MST parse **immediately**. Trying to > "improve" MST is a total waste of time. > > Seriously: Instead, try skipping the MST step entirely. Just do not even > do it, AT ALL. Rip it out. It is NOT a step that the algorithm even needs. > I'll bet you that if you skip the MST step completely, the quality of your > results will be more-or-less unchanged. The results might even get > better! > > If your results don't change, by skipping MST, or if your results get > better, by skipping MST, then that should be a clear indicator that trying > to "improve" MST is a waste of time! > > -- Linas > > -- > cassette tapes - analog TV - film cameras - you > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M843d12260f98baeb7e8413e0> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M98b30f438ce4fbf95f415ded Delivery options: https://agi.topicbox.com/groups/agi/subscription