[agi] It's Visual Scene Recognition, Stupid

Mike Tintner Fri, 05 Apr 2013 04:07:26 -0700

Here's what I was reaching for in my last post to Matt - it represents aprofound change in vision philosophy (and helps explain why currentAI/scientific approaches to vision (& everything else) are so mindblowinglyfragmented and simplistic).

Vision is always "visual scene recognition" and never just "visual objectrecognition".


To realise this, look at how you look at four pictures:

http://media.oregonlive.com/portland_impact/photo/barber-shootingjpg-fca82e981d021b3b.jpg

http://www.forgotmylines.com/wp-content/uploads/2010/07/ReasonstobePrettyFIghtScene.jpg

http://2.bp.blogspot.com/_ASZ1J20yTgQ/TLuZJT-TgXI/AAAAAAAAAA0/ps7cifXlBWo/s1600/DSC01098.JPG

http://simonhalliday.files.wordpress.com/2009/03/library8003.jpg

What you see - what "visual scene recognition" means first of all - is[seeing] **object[s]-in-a- field**

You see - a policeman putting up a tape on a street, kids fighting on aplayground[or similar], a guy playing tennis on a tennis court, books andcandles on a table in some kind of library.

You don't just see isolated objects - you automatically seeobjects-doing-things-in-a-field. You aren't just aware vaguely of the"gestalt" or the "context", which is the vague way some vision philosophyhas thought about these things. You immediately look to see what theprincipal objects are doing in that whole field. Your life in the realworld depends on quickly working that out - otherwise thatcar/animal/falling masonry may hit you. Every scene is potentiallydangerous.

But current AI and scientific thinking never - characteristically - "seesthe big picture". AI and science metacognitively only see isolatedobjects - they see the world like this:


http://www.freevector.com/site_media/preview_images/FreeVector-People-Vector-Art.jpg

They don't see or look for the scene - the field surrounding those objects,and what those objects are doing in (& how they're relating to) that field.

And this applies to everything AI touches. Its vision is always fragmented.It sees only fragmented words and sentences:


http://img2.etsystatic.com/000/0/5464342/il_fullxfull.130745770.jpg

It never sees the text as a whole. And human readers always readsentences-in-a-text, (and talk sentences-in-a-conversation), never justsentences.


But even "visual scene recognition" is too simplistic!

Actually, it's always "visual movie theatre recognition".

Human vision is never vision of just a scene out there. It's simultaneously:

"vision of an observer (oneself) viewing the scene" [/objects in a field].The acting observer seeing as well as the scene seen.

We always see ourselves watching the scene - are always aware of the pointof view of the scene, and its distance from us (as I have detailed before onthis forum). In real life, of course, we are normally acting in that fieldand moving through it - and the position of objects in relation to us isvital information..

And then "visual movie theatre recognition" entails not just visual-scene,but visual-scene-in-a-MOVIE recognition!! (in a whole story!)

We don't just see a scene in an isolated moment in time, but are aware of itas a scene-in-a-stream-of-scenes - a movie

We don't just see objects in a field in a timeless moment - - we place themas moving in time as well as space . When we look at those kids fighting, wedon't just see them occupying isolated postures - we see them as moving intime - having started a fight beforehand, and still to finish it. When welook at the policeman we are aware that he has come up to that tape in thepast, and will move away from it in the future.

And similarly, we don't just see objects-in-a-field, butobjects-in-a-field-in-a-WORLD. As objects not just in a single "theatre ofoperations" but a whole world-of-operations. We are aware of the fieldsthat lie beyond the immediate field - what lies beyond the movie theatre -this may be crucial to understanding what is happening We couldn'tunderstand this picture if our vision were confined to the immediatewing/field shown, and not the fields beyond the picture:


http://www.google.co.uk/imgres?um=1&hl=en&biw=1645&bih=767&tbm=isch&tbnid=lBHESwvnTIYWOM:&imgrefurl=http://www.advrider.com/forums/showthread.php%3Ft%3D499369%26page%3D26&docid=Uoxo7hdI1Bj4MM&imgurl=http://rookery2.viary.com/storagev12/1189500/1189682_2b3b_625x625.jpg&w=399&h=300&ei=m6deUbWqHeOm0QWjtYDIBw&zoom=1&ved=1t:3588,r:22,s:0,i:153&iact=rc&dur=2325&page=1&tbnh=184&tbnw=251&start=0&ndsp=28&tx=130&ty=89

And actually - well you knew this was coming, right? - we don't just see"objects" - lifeless things. We see BODIES. We always see bodies. We'realways aware of how those bodies move - whether they're alive or dead - andhow they're moving.

And "visual body recognition" is always EMBODIED. You can't understand howother bodies - especially other human bodies - move if you haven't got abody yourself - and can't use that body to simulate how other bodies do andwill move.


So let's see [!] what we have here.

"Visual recognition" - human vision - is a helluva lot more than "visualobject recognition". "Visual scene recognition" is a bare minimum "Visualmovie theatre recognition" is more like it.

And that means embodied-vision-of-a-viewer-viewing-objects/bodies-moving-at-points-in-a-stream-of-=movement- in-a-field-within-a-world-of-fields.

And that's the easy-peasy part. Then we get on to understanding language -and general conceptualisation - which deals with whole classes of object -inwhole classes of fields etc. in whole classes of world

Vision then entails not just the small , scientific, fragmented, narrow,narrow-minded, totally blinkered picture of objects, but the big, artistic,integrated, broad-minded, panoramic picture of what lies beyond that objectin space and time - in a field in a world of fields, in a scene in a streamof scenes.


How many quadrillions you got, Matt?











-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

[agi] It's Visual Scene Recognition, Stupid

Reply via email to