Based on how they describe it, they're using patching and represent videos
as grids of diffused images. Definitely some clever attention work going on
too, since occluded objects also remain pretty stable and also between
camera cuts. The model size is speculated to be not that high either (rumor
is 3-5 billion parameters, definitely runnable on a consumer GPU.) The real
magic seems to be in the synthetic training data and their captioning
recipe, which IMO, so many labs are behind them when it comes to labeling
quality (If anyone remembers, the big reason GPT-3 and ChatGPT 3.5 were so
good at the time was the sheer volume of high quality labeled data they got
from ScaleAI. It's a solid formula that hasn't really been tapped outside
their org)

On Fri, Feb 16, 2024, 8:30 AM stefan.reich.maker.of.eye via AGI <
agi@agi.topicbox.com> wrote:

> I'm officially speechless. Check out the waves example at
> https://openai.com/sora. This is photorealism. Are they now simulating
> physics too? I wouldn't know how you could distinguish this from a real
> video. How did they advance the field by this much with a single release?
> OpenAI has shown their superior inspiration once again.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T4ad5d8c386d0e116-M7b407449a6ad8ed7805334b0>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T4ad5d8c386d0e116-Mfeb74075e425525d673e35a7
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to