Ya, you could think of every of those tasks as being a word, like color, shape, move, scale, duplicate, and if you see it enough, you write about it more, so, if you see a new cube and saw "move left" many times, your new cube will predict next "move left", either the cube decodes itself/ transforms itself or the next item predicted is "move left". Naturally a video playing can do all the tasks shown above. The difference in the paper is it is *suddenly" changing, it doesn't show you a cube falling down or being rotated, just the final frame! And it's just activating your videos in your brain. So upon seeing new cube, you see it rotate, and you only only draw the final frame i.e once is transformed fully or the Byte Pair Encoding ends (write next word or phrase, you only draw the final word OF that).
This massively clarifies it all to me now, if I'm correct. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T604ad04bc1ba220c-Md715ccc4bb4a435e3ae5c725 Delivery options: https://agi.topicbox.com/groups/agi/subscription