Kingma, D.P. wrote:
On Mon, Mar 3, 2008 at 6:39 PM, Richard Loosemore <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    The problems with bolting together NN and GA are so numerous it is hard
    to know where to begin.  For one thing, you cannot represent structured
    information with NNs unless you go to some trouble to add extra
    architecture.  Most NNs can only cope with single concepts learned in
    isolation, so if you show a visual field containing 5,000 copies of the
    letter 'A', all that happens is that the 'A' neuron fires.

    If you do find some way to get around this problem, your solution will
    end up being the tail that wags the dog:  the NN itself will fade into
    relative insignificance compared to your solution.


Well, you could achieve that (5000 registration of the letter 'A' with their corresponding position in the image) by using a sliding window over multiple rescaled (and maybe other transformations) transformations of the input image. This way, you get image patches for each window and scale (and maybe other transformations), and each patch can be a given a corresponding position in multidimensional space (e.g., an image patch with X and Y position and scale S has is a point in 3-dimensional space). For each of the produced points (patches) in the space, run the neural net to produce a lower-dimensional code and corresponding energy (= reconstruction quality). Now filter this space by let the points have local battles for salience using some heuristic (e.g. lower energy means higher salience) and filter out the low-salient points. This produces a filtered space with fewer points then the previous one, and each point containing a lower-dimensional code.

In the example of the letter 'A', the above method would recognize all 5000 versions while remembering their individual input position. This presumes the neural net is properly trained on the letter 'A' and can properly reconstuct them (using Hinton's method). This should produce 5000 registrations of the letter 'A', while filtering out unimportant information.

But you could take it a step further. For each image input, the above method creates a filtered, 3-dimensional space with points containing low-dimensional codes. This space can then again be harvested by taking patches with each patch containing /n/ points, each point containing an /m /dimensional code, so each patch being (/m/*/n/)./ /A neural net can be trained on lowering the dimension of these patches from (/m/*/n/) to something lower-dimensional. This process is quite similar to the one in the previous paragraph.

What could /possibly /go wrong? :)

Regards,
Durk Kingma

Excellent!  Sounds like a perfect solution ;-).

Oh, wait!

What about......... if the scene is structured in such a way that the 5,000 copies of the letter 'A' were actually scattered around in such a way that most (but not all) of them were arranged to form a huge letter 'A'?

Would it then count 5,001 copies?

Oh, and one more thing I forgot to mention that is in the same scene (how could I forget this one?): there are also a couple of women standing side by side, leaning against each other with their shoulders touching and keeping their bodies stiff and straight, forming the two sides of a letter 'A', and holding a model of a horizontally reclining woman between them at waist height, to form the crossbar of a letter 'A'.

Could we get the NN to recognize, in the context of the overall scene, that here were actually 5,002 copies of the letter 'A'......?

And if the scene had one single, rather small letter B over in the corner, would the NN find this funny?

You have 30 minutes to devise an algorithm, Durk... :-).



Richard Loosemore


-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to