On Mon, May 21, 2018 at 6:11 PM, Alexey Potapov <pota...@aideus.com> wrote:
> > > 2018-05-22 0:11 GMT+03:00 Linas Vepstas <linasveps...@gmail.com>: > >> >> How many are we talking about, here? dozens, hundreds of objects? >> hundreds of predicates per object? That is 100x100 = 10K and, currently, >> you can create and add maybe 100K atoms/sec to the atomspace (via C++, less >> by scheme, python, due to wrapper overhead). So this seems manageable. >> > > Thousands or even millions of objects. I can ask you a question about > a speck of dust sparkling in the sunlight, hot pixel on your screen, tiny > birthmark on a face, a hole in a button with a thread passing through it, > etc. Each pixel belongs to tens of "objects"... > I want to keep this conversation realistic. Sophia, today, struggles to see human faces. That's like, 1 or 2 of them. On a good demo day, she can see facial expressions. sort-of. distinguishing between at most half-a-dozen of them. With low accuracy. If you get lucky. That's pretty much it. Ok, so using stuff like realsense and custom software she can kind-of-ish see hands and arms, *if* the are dead-ahead, well-posed, good lighting, no background movements (e.g. 5 other people crowding around). But if you tilt your head, have bad lighting (viz what we consider "normal" lighting), she's blind. If she could become consistently aware of even just one object, besides a face, in her field of view, without cheats like coloring it some bright color, that would be great. If she could see ten things, that would be mind-blowingly awesome. She could go on TV interviews and answer questions like "what do you see?". Realistic compute power -- lets say several laptops worth of compute, and a GPU card that doesn't have some insanely whirry fan. This is what you can get on-site, at the location where the vision is happening. Of course, you can also stream data to the cloud, and do lots of processing there, but then there are bandwidth issues, and latency issues. So the 100K Atoms/sec number is on a one-or-two-core 2014-2016 vintage average desktop-type CPU. > >> The question is how to combine OpenCog and neural networks on the >> algorithmic level. Let us return to the considered request for VQA. We can >> imagine a grounded schema node, which detects all bounded boxes with a >> given class label, and inserts them into Atomspace, >> > > For example, one creates a ConceptNode "dress". One also creates a > PredicateNode "*-bounding-box-*" Then one writes C++ code to implement the > TensorFlowBBValue object. One then associates all three: > > (cog-set-value! (Concept "dress") (Predicate "*-bounding-box-*") > (TensorFlowBBValue "obj-id-42")) > > What is the current bounding box for that dress? I don't know, but I can > find out: > > (cog-value->list (cog-value (Concept "dress") (Predicate > "*-bounding-box-*"))) > > returns 2 or 4 floating point numbers, as a list. Is Susan wearing that > dress? > > (cog-set-value! (Concept "Face-of-Susan") (Predicate "*-bounding-box-*") > (TensorFlowBBValue "obj-id-66")) > > (is-near? A B) (> 0.1 distance (cog-value A (Predicate > "*-bounding-box-*")) (cog-value B (Predicate "*-bounding-box-*")) > > returns true if there is less than 0.1 meters distance between the > bounding boxes on A and B. > > The actual location of the bounding boxes is never stored, and never > accessed, unless the is-near? predicate runs. > > Well... the problem is that our system should learn most of this somehow, I agree! > and it cannot learn C++ or Schema code Of course! The whole point of Atomese is that it is a kind-of programming language that can be machine-manipulated, machine-learned. > We would like to hardcode as less as possible. We can (and likely should) code TensorFlowValue I think that would be a good experiment to conduct. While Ben and other enjoy designing systems top-down, I like to pursue a bottom-up approach -- build something, see how well it works. If it works poorly, make sure that we understood *why* it failed, and what parts were good, and then try again. So, for me a TensorFlowValue object would highlight what's good and what's bad in the current design. Engineering hill-climbing. > but we would like to avoid hardcoding (is-near? A B). I agree, sort-of-ish. English language propositions are a "closed class" - its a finite list, and a fairly small list -- a few dozen that are truly practical. A few hundred, if you start listing archaic, obsolete, rare ones, ones unapplicable to images ... https://en.wikipedia.org/wiki/List_of_English_prepositions So for now, I find it acceptable to hard code a certain subset. A discussion about "how can we learn prepositions from nothing?" would have to be a distinct conversation. > >> These nodes correspond not just to neural layers, but also to operations >> over them. One can imagine TensorNode nodes connected by PlusLink, >> TimesLink, etc.. >> > > Yes. However, we might also need PlusValue or TimesValue. I do not know > why, yet, but these are potentially useful, as well. > > This is exactly my question whether we need them or not :) Whether they are needed or not depends a lot on what kind of data is exposed by TensorFlowValue, and how that data is then routed up into the natural-language and reasoning layers. There are multiple possible designs for this; there is no particular historical precedent (in the atomspace) for this. > Nil also proposed to use GetValueLink... I didn't really understand that proposal. He seemed to be talking about truth values, not values in general. > > Perhaps we need an IsLeftOfLink that knows automatically to obtain the > "*-centroid-*" value on two atoms, and then return true/false depending on > the result (or throw exception if there is no *-centroid-* value.) > > Sorry, I didn't not precisely get this. What is centroid and how is it connected to IsLeftOf? https://en.wikipedia.org/wiki/Centroid It avoids some of the complexity of bounding boxes (which might be touching, overlapping or inside-of.) Following bottom-up design principles, I would rather have a simple, well-thought-out, fast, clear, working proof-of-concept before adding many dozens of complex spatial and temporal relationships. The *-someword-* is just a common way of naming quasi-global-variables in scheme. It's an "eyecatcher". ascii-art visual bling. So I imagine that there could be a (PredicateNode "*-centroid-*) that acts as a key, and the value for it would be x,y,z floating point values. Meanwhile, (PredicateNode "*-bounding-box-*") would be associated with 6 floats - two opposed corners of a cuboid, or a (PredicateNode "*-ellipsoid-*") might return 15 floating-point numbers given an ellipsoid. Linas. -- cassette tapes - analog TV - film cameras - you -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To post to this group, send email to opencog@googlegroups.com. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA376JLMBu_gYzuv3EnxupvRXSgjhvWUZcrBretBaxGZFOg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.