On Mon, May 21, 2018 at 6:11 PM, Alexey Potapov <pota...@aideus.com> wrote:

>
>
> 2018-05-22 0:11 GMT+03:00 Linas Vepstas <linasveps...@gmail.com>:
>
>>
>> How many are we talking about, here? dozens, hundreds of objects?
>> hundreds of predicates per object? That is 100x100 = 10K and, currently,
>> you can create and add maybe 100K atoms/sec to the atomspace (via C++, less
>> by scheme, python, due to wrapper overhead). So this seems manageable.
>>
>
> Thousands or even millions of objects. I can ask you a question about
> a speck of dust sparkling in the sunlight, hot pixel on your screen, tiny
> birthmark on a face, a hole in a button with a thread passing through it,
> etc. Each pixel belongs to tens of "objects"...
>

I want to keep this conversation realistic.  Sophia, today, struggles to
see human faces. That's like, 1 or 2 of them. On a good demo day, she can
see facial expressions. sort-of. distinguishing between at most
half-a-dozen of them. With low accuracy. If you get lucky. That's pretty
much it.  Ok, so using stuff like realsense and custom software she can
kind-of-ish see hands and arms, *if* the are dead-ahead, well-posed, good
lighting, no background movements (e.g. 5 other people crowding around).
But if you tilt your head, have bad lighting (viz what we consider "normal"
lighting), she's blind.  If she could become consistently aware of even
just one object, besides a face, in her field of view, without cheats like
coloring it some bright color, that would be great. If she could see ten
things, that would be mind-blowingly awesome. She could go on TV interviews
and answer questions like "what do you see?".

Realistic compute power -- lets say several laptops worth of compute, and a
GPU card that doesn't have some insanely whirry fan.  This is what you can
get on-site, at the location where the vision is happening.

Of course, you can also stream data to the cloud, and do lots of processing
there, but then there are bandwidth issues, and latency issues.

So the 100K Atoms/sec number is on a one-or-two-core 2014-2016 vintage
average desktop-type CPU.




>
>> The question is how to combine OpenCog and neural networks on the
>> algorithmic level. Let us return to the considered request for VQA. We can
>> imagine a grounded schema node, which detects all bounded boxes with a
>> given class label, and inserts them into Atomspace,
>>
>
> For example, one creates a ConceptNode "dress".  One also creates a
> PredicateNode "*-bounding-box-*"  Then one writes C++ code to implement the
> TensorFlowBBValue object.   One then associates all three:
>
> (cog-set-value! (Concept "dress") (Predicate "*-bounding-box-*")
> (TensorFlowBBValue "obj-id-42"))
>
> What is the current bounding box for that dress?  I don't know, but I can
> find out:
>
> (cog-value->list (cog-value (Concept "dress") (Predicate
> "*-bounding-box-*")))
>
> returns 2 or 4 floating point numbers, as a list.    Is Susan wearing that
> dress?
>
> (cog-set-value! (Concept "Face-of-Susan") (Predicate "*-bounding-box-*")
> (TensorFlowBBValue "obj-id-66"))
>
> (is-near? A B)  (> 0.1 distance (cog-value A (Predicate
> "*-bounding-box-*")) (cog-value B (Predicate "*-bounding-box-*"))
>
> returns true if there is less than 0.1 meters distance between the
> bounding boxes on A and B.
>
> The actual location of the bounding boxes is never stored, and never
> accessed, unless the is-near? predicate runs.
>

> Well... the problem is that our system should learn most of this somehow,

I agree!

> and it cannot learn C++ or Schema code

Of course! The whole point of Atomese is that it is a kind-of programming
language that can be machine-manipulated, machine-learned.

> We would like to hardcode as less as possible. We can (and likely should)
code TensorFlowValue

I think that would be a good experiment to conduct.  While Ben and other
enjoy designing systems top-down, I like to pursue a bottom-up approach --
build something, see how well it works. If it works poorly, make sure that
we understood *why* it failed, and what parts were good, and then try
again.  So, for me a TensorFlowValue object would highlight what's good and
what's bad in the current design.  Engineering hill-climbing.

> but we would like to avoid hardcoding (is-near? A B).

I agree, sort-of-ish.  English language propositions are a "closed class" -
its a finite list, and a fairly small list -- a few dozen that are truly
practical. A few hundred, if you start listing archaic, obsolete, rare
ones, ones unapplicable to images ...
https://en.wikipedia.org/wiki/List_of_English_prepositions   So for now, I
find it acceptable to hard code a certain subset.

A discussion about "how can we learn prepositions from nothing?" would have
to be a distinct conversation.



>
>> These nodes correspond not just to neural layers, but also to operations
>> over them. One can imagine TensorNode nodes connected by PlusLink,
>> TimesLink, etc..
>>
>
> Yes.  However, we might also need PlusValue or TimesValue.  I do not know
> why, yet, but these are potentially useful, as well.
>

> This is exactly my question whether we need them or not :)

Whether they are needed or not depends a lot on what kind of data is
exposed by TensorFlowValue, and how that data is then routed up into the
natural-language and reasoning layers. There are multiple possible designs
for this; there is no particular historical precedent (in the atomspace)
for this.

> Nil also proposed to use GetValueLink...

I didn't really understand that proposal. He seemed to be talking about
truth values, not values in general.




>
> Perhaps we need an IsLeftOfLink that knows automatically to obtain the
> "*-centroid-*" value on two atoms, and then return true/false depending on
> the result (or throw exception if there is no *-centroid-* value.)
>

> Sorry, I didn't not precisely get this. What is centroid and how is it
connected to IsLeftOf?

https://en.wikipedia.org/wiki/Centroid

It avoids some of the complexity of bounding boxes (which might be
touching, overlapping or inside-of.)

Following bottom-up design principles, I would rather have a simple,
well-thought-out, fast, clear, working proof-of-concept before adding many
dozens of complex spatial and temporal relationships.

The *-someword-*  is just a common way of naming quasi-global-variables in
scheme. It's an "eyecatcher". ascii-art visual bling.  So I imagine that
there could be a (PredicateNode "*-centroid-*) that acts as a key, and the
value for it would be x,y,z floating point values.   Meanwhile,
(PredicateNode "*-bounding-box-*") would be associated with 6 floats - two
opposed corners of a cuboid, or a (PredicateNode "*-ellipsoid-*") might
return 15 floating-point numbers given an ellipsoid.

Linas.

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA376JLMBu_gYzuv3EnxupvRXSgjhvWUZcrBretBaxGZFOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to