On Feb 17, 2008 11:56 PM, Bob Mottram <[EMAIL PROTECTED]> wrote:
> On 17/02/2008, Kingma, D.P. <[EMAIL PROTECTED]> wrote:
> > I'm wondering how
> > much further one could extend such architecture to coding of
> > spatiotemporal (video) patterns, multimodal patterns (video + audio)
> > and eventually coding of 3D objects. They are all 'just' extensions of
> > such a model, 'just' about finding efficient ways of learning the
> > joint probability distributions :) however I imagine that finding
> > efficient ways of training such models (e.g. finding compact
> > representations) should become increasingly hard.
>
>
> This is true.  In principle reconstructing a 3D model based upon
> observations from one or two images over time is just the reverse of
> the ray tracing problem.  By finding correspondences, either in
> structure from motion or by stereo correspondence (basically these two
> things are the same problem) you can then try to probabilistically
> model the ray of light which traveled to the image pixel from the
> object.  There's no doubt that this is a hard problem, but I think
> it's one which is solvable.

>
> The next logical step in that fellow's research is as you say to
> extend the approach to matching features over time in video sequences.
>  This involves not only detecting the features themselves but also
> making a forward prediction about where the feature will be next and
> iteratively modeling the position uncertainty and local surface
> orientation.  Andrew Davison's group is doing just this kind of thing,
> applying information theory to vision with some success.

Yes, stereo correspondence and structure from motion are very similar,
although in the second case (like SLAM) there's the extra task of
determining the relative camera position. Anyway, I think that SLAM is
a very useful technology to determine camera location and motion and I
imagine it as being one of the first steps when analyzing video, as
you know better then me, for example to ensure epipolarity for
disparity analysis. However, as you will agree with me, the features
used (SIFT-like i believe) are very sparse and not directly useful for
actual scene reconstruction. However, stereo correspondence becomes
more robust when abstract and invariant features are used (like taking
the max over Gabor-like responses, as you describe on your web page),
so I imagine that some features higher in the Michael Lewicki'
architecture would be quite useful for disparity matching. I have no
idea which abstraction levesl our brain uses to guide stereopsis,
though. Could be that distance of far-away objects is done by
calculating the disparity of the whole objects (vs. its pixels).

>
> -------------------------------------------
> agi
> Archives: http://www.listbox.com/member/archive/303/=now
> RSS Feed: http://www.listbox.com/member/archive/rss/303/
> Modify Your Subscription: http://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to