If you would like to host a 100 GB text benchmark, feel free. You'll soon find out that smaller is a lot more practical. I use 1 GB because that is enough for human level AI. That is how much language you hear, read, write, and speak in a lifetime. I realize that current AI like Google, Alexa, and GPT-2 are far beyond human level.
On Fri, May 28, 2021, 6:05 PM <immortal.discover...@gmail.com> wrote: > So the Hutter Prize contest rules state only CPU usage, not GPU. I assume > you can use CPU cores for parallelization. > > Matt's contest (LTCB) allows GPU usage, unlimited core counts, and memory, > and time. > "Timing information, when available, may vary widely depending on the test > machine used." > > Both are correct...........with a small issue I found I think......... We > know an algorithm can train on more data or use more cores, the contests > restrict the data to a dataset enwik9 and at a stuck size of 1GB, we don't > need to benchmark AI on more data to see who's is better. Nor use more > cores. But this isn't to say 1 core is all we need, then can multiple the > usage over 100000 cores. We should make sure the algorithm CAN be parallel, > so we allow in the rules to use 4 cores in the CPU. Hutter Prize does this, > you can use ex. 4 CPU cores and only enwik9, all limited in amount. Matt's > goes too far, (and really it is a good thing but problem is he doesn't go > all the way, keep reading), there is no cores limit, this is not good, it's > like using more data, my AI can get LESS error per ratio if train on 100GB > of text as it does better on bigger data hence better ratio, same if I use > more cores, it doesn't tell us who's AI is better really, only who has more > cash at home to use more cores or train on more data/ time. Matt's allows > unlimited cores but limited data size, why can't I show my ratio from using > 100GB? Now, I DO agree the HP contest should limit cores and data amount > used to see who's AI is better, and I agree also we should have a unlimited > contest like Matt's that shows how good a predictor can be, but Matt's > needs to start allowing unlimited dataset sizes, he only currently allows > unlimited core usage. > Because if you have more cash then you can use more > cores and get a better predictor from having more compute, -- this, this > right here in not public equalness, only the rich can get the best score, > so the contest is no longer public - it is what is possible on Earth !, > hence Matt's contest should also allow 100GB+ usage too, so we can show how > good a predictor can be. Why unlimited cores but not unlimted data scores? > We /can/ compare ratios, i.e. notice how Fabrice Bellard's scores 15MB on > 100MB and 110MB on 1GB, well that means for the amount of prompts it saw it > predicted the actual answer blindfoldedly that accurately averaged over all > prompts, 89% accurate per prompt on average! It will more accurate on 1TB > of text. > > So I'm going to add to my Guide that we should use 1 contest for finding > better AI (HP...), and another contest for finding the best implementation > of AI (half does Matt's match this criteria :(... ). Simply start adding > 10GB+ benchmarks Matt, it is easy to just take the top algorithms and get > some stats right away. Also beowulfs and supercomputers, should allow > intense parallization... Korrelan seems to do this and so do > supercomputers, and I think OpenAI. > > My job is obviously the HP contest, I could, sooner (instead of later), > try large GPU usage but really this is not my game it is a rich man's job, > I can get richer by doing the HP contest (showing my AI is smarter, not > that I am rich/ have more cores). In this case my AI may appear worse than > Bellard's score then, for now. Doing any elaborate core usage or even using > GPU would be a waste of time like using more data is for finding smarter > AI, for the most part. > > Then again hmm, Matt's contest is about scaling using more cores, a rich > man's job, what we can do on Earth....but maybe you can use a limited > dataset size of 1GB....I mean his test is who has more money, it's clear > when used with 1GB, why use 10GB then? It would only change if had more > cash for more memory, compute, I mean to score good on 10GB speed won't > change it any more than scoring on 1GB (except for the contest of who has > more time to train AI), hmm same for more memory. So it's a contest of who > has more cash, and spent longer training AI, but the later requires > unlimited datset size, otherwise matt's contest is fine then. But since > it's not a who's AI is smarter "contest" and is a who is richer contest, > why is it any weirder so see it as a who spent longer time training contest > (hence use 1TB of text)? > > And if you had 1 trillion cores and big RAM or cache, and used enwik[3], > or IOW 1KB of data, you couldn't use all your cores really, assuming you > can look at the future far ahead and no longer are using compression > (evaluation), for training. So.....perhaps using more data ex. 1Tb of text > shows how rich you are. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/Te9633f76cfbb22e5-M7890e697c4d229f7091aa213> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Te9633f76cfbb22e5-Mc1180d0128730a0af5b64fc7 Delivery options: https://agi.topicbox.com/groups/agi/subscription