(proposed R&D project for fall 2016 - 2017) We are now pretty close (a month away, perhaps?) to having an initial, reasonably reliable version of an OpenCog-controlled Hanson robot head, carrying out basic verbal and nonverbal interactions. This will be able to serve as a platform for Hanson Robotics product development, and also for ongoing OpenCog R&D aimed at increasing levels of embodied intelligence.
This email makes a suggestion regarding the thrust of the R&D side of the ongoing work, to be done once the initial version is ready. This R&D could start around the beginning of September, and is expected to take 9-12 months… GENERAL IDEA: Initial experiment on using OpenCog for learning language from experience, using the Hanson robot heads and associated tools In other words, the idea is to use simple conversational English regarding small groups of people observed by a robot head, as a context in which to experiment with our already-written-down ideas about experience-based language learning. BASIC PERCEPTION: I think we can do some interesting language-learning work without dramatic extensions of our current perception framework. Extending the perception framework is valuable but can be done in parallel with using the current framework to drive language learning work. What I think we need to drive language learning work initially, is that the robot can tell, at each point in time: — where people’s faces are (and assign a persistent label to each person’s face) — which people are talking — whether an utterance is happy or unhappy (and maybe some additional sentiment) — if person A’s face is pointed at person B’s face (so that if A is talking, A is likely talking to B) [not yet implemented, but can be done soon] — the volume of a person’s voice — via speech-to-text, what people are saying — where a person’s hand is pointing [not yet implemented, but can be done soon] — when a person is moving, leaving or arriving [not yet implemented, but can be done soon] — when a person sits down or stands up [not yet implemented, but can be done soon] — gender recognition (woman/man), maybe age recognition EXAMPLES OF LANGUAGE ABOUT THESE BASIC PERCEPTIONS While simple this set of initial basic perceptions lets a wide variety of linguistic constructs get uttered, e.g. Bob is looking at Ben Bob is telling Jane some bad news Bob looked at Jane before walking away Bob said he was tired and then sat down People more often talk to the people they are next to Men are generally taller than women Jane is a woman Do you think women tend to talk more quietly than men? Do you think women are quieter than men? etc. etc. It seems clear that this limited domain nevertheless supports a large amount of linguistic and communicative complexity. SECOND STAGE OF PERCEPTIONS A second stage of perceptual sophistication, beyond the basic perceptions, would be to have recognition of a closed class of objects, events and properties, e.g.: Objects: — Feet, hands, hair, arms, legs (we should be able to get a lot of this from the skeleton tracker) — Beard — Glasses — Head — Bottle (e.g. water bottle), cup (e.g. coffee cup) — Phone — Tablet Properties: — Colors: a list of color values can be recognized, I guess — Tall, short, fat, thin, bald — for people — Big, small — for person — Big, small — for bottle or phone or tablet Events: — Handshake (between people) — Kick (person A kicks person B) — Punch — Pat on the head — Jump up and down — Fall down — Get up — Drop (object) — Pick up (object) — Give (A gives object X to B) — Put down (object) on table or floor CORPUS PREPARATION While the crux of the proposed project is learning via real-time interaction between the robot and humans, in the early stages it will also be useful to experiment with “batch learning” from recorded videos of human interactions, video-d from the robot’s point of view. As one part of supporting this effort, I’d suggest that we 1) create a corpus of videos of 1-5 people interacting in front of the robot, from the robot’s cameras 2) create a corpus of sentences describing the people, objects and events in the videos, associating each sentence with a particular time-interval in one of the videos 3) translate the sentences to Lojban and add them to our parallel Lojban corpus, so we can be sure we have good logical mappings of all the sentences in the corpus Obviously, including the Stage Two perceptions along with the Basic Perceptions, allows a much wider range of descriptions, e.g. … A tall man with a hat is next to a short woman with long brown hair The tall man is holding a briefcase in his left hand The girl who just walked in in a midget with only one leg Fred is bald Vytas fell down, then Ruiting picked him up Jim is pointing at her hat. Jim pointing at her hat and smiling made her blush. However, for initial work, I would say it’s best if at least 50% of the descriptive sentences involve only Basic Perceptions … so we can get language learning experimentation rolling right away, without waiting for extended perception… LANGUAGE LEARNING What I then suggest is that we 1) Use the ideas from Linas & Ben’s “unsupervised language learning” paper to learn a small “link grammar dictionary” from the corpus mentioned above. Critically, the features associated with each word should include features from non-linguistic PERCEPTION, not just features from language. (The algorithms in the paper support this, even though non-linguistic features are only very briefly mentioned in the paper.) …. There are various ways to use PLN inference chaining and Shujing’s information-theoretic Pattern Miner (both within OpenCog) in the implementation of these ideas… 2) Once (1) is done, we then have a parallel corpus of quintuples of the form [audiovisual scene, English sentence, parse of sentence via link grammar with learned dictionary, Lojban sentence, PLN-Atomese interpretation of Lojban sentence] We can take the pairs [parse of sentence via link grammar with learned dictionary, PLN-Atomese interpretation of Lojban sentence] from this corpus and use them as the input to a pattern mining process (maybe a suitably restricted version of the OpenCog Pattern Miner, maybe a specialized implementation), which will mine ImplicationLinks serving the function of current RelEx2Logic rules. The above can be done for sentences about Basic Perceptions only, and also for sentences about Second Stage Perceptions. NEXT STEPS FOR LANGUAGE LEARNING The link grammar dictionary learned as described above will have limited scope. However, it can potentially be used as the SEED for a larger link grammar dictionary to be learned from unsupervised analysis of a larger text corpus, for which nonlinguistic correlates of the linguistic constructs are not available. This will be a next step of experimentation. NEXT STEPS FOR INTEGRATION Obviously, what can be done with simple perceptions can be done with more complex perceptions as well … the assumption of simple perceptions is because that’s what we have working or almost-working right now… but Hanson Robotics will put significant effort into making better visual perception for their robots, and as this becomes a reality we will be able to use it within the above process.. -- Ben Goertzel, PhD http://goertzel.org Super-benevolent super-intelligence is the thought the Global Brain is currently struggling to form... -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To post to this group, send email to opencog@googlegroups.com. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBe8Dy7Ojb0uL1Rqow6u7rEmGLRw%2B%2B6nSsL5Af591KXMBw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.