On 17/02/2008, Kingma, D.P. <[EMAIL PROTECTED]> wrote: > I'm wondering how > much further one could extend such architecture to coding of > spatiotemporal (video) patterns, multimodal patterns (video + audio) > and eventually coding of 3D objects. They are all 'just' extensions of > such a model, 'just' about finding efficient ways of learning the > joint probability distributions :) however I imagine that finding > efficient ways of training such models (e.g. finding compact > representations) should become increasingly hard.
This is true. In principle reconstructing a 3D model based upon observations from one or two images over time is just the reverse of the ray tracing problem. By finding correspondences, either in structure from motion or by stereo correspondence (basically these two things are the same problem) you can then try to probabilistically model the ray of light which traveled to the image pixel from the object. There's no doubt that this is a hard problem, but I think it's one which is solvable. The next logical step in that fellow's research is as you say to extend the approach to matching features over time in video sequences. This involves not only detecting the features themselves but also making a forward prediction about where the feature will be next and iteratively modeling the position uncertainty and local surface orientation. Andrew Davison's group is doing just this kind of thing, applying information theory to vision with some success. ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b Powered by Listbox: http://www.listbox.com