[Metamath] Re: A formula for 'minimize' run time or how to schedule minimization

savask Wed, 01 Jan 2020 08:55:47 -0800

>
> For you or anyone wishing to put together the scripts for the current 
> set.mm using your formula(s), I uploaded 3 files from the 2018 run for 
> reference:
> ...
> The emails describe how we did this as well as some problems we ran into. 
> Some jobs took much longer than others, and we ended up with 72 logs.



With that information it is possible to compare the schedule you used in 
2018 with the schedules built by other approaches. Your schedule apparently 
had the form:

[41040,57642,44711,45151,50352,45021,49597,46507,42826,43233,44058,43347,49606,48702,45067,39088,45075,44259,44467,50376,48313,45099,47391,225349]

with maximum job time being 225349. Meanwhile, if one were to randomly 
scramble theorems and evenly distribute them among 24 CPUs, the maximum job 
time would have been ~80521, which is ~36% of the 2018 run. Maximum job 
time resulting from the LPT algorithm with 'size' heuristic is ~67790, 
which is ~30% of the 2018 run. If we could predict minimization run times 
with absolute precision, LPT algorithm would have managed to produce a 
makespan of ~52994, which is ~24% of the 2018 run; that seems to be out of 
reach now :-)

If I understand correctly, we should first add theorems with largest 
> expected run time, right?  Otherwise if there are 999 fast tiny ones added 
> first, they would become about evenly distributed, and the 1000th giant one 
> added at the end would skew one job significantly.
>

Longest Processing Time (LPT) scheduling algorithm works like this. You 
sort jobs (theorems) based on the time they take (sizes in our case) in the 
non-increasing order, and then assign each job to a CPU with the smallest 
load, one by one (CPU load is the sum of run times of jobs it has at the 
moment). It's essentially the same as if each CPU pulled a new job out of 
the sorted queue once it completed a previous task.

Obviously the best way would be to implement that algorithm "online", so 
that different copies of the metamath program would pull theorems to 
minimize from a common sorted queue, but I don't know a way to do that 
without some sort of threads library.

It will be interesting to see how this works out for the current set.mm, 
> which has had massive modifications in the past 2 years.
>
 
I made a bash script to generate job1xx.cmd files automatically: 
https://gist.github.com/savask/e44ee6086a1321b2bbddbce027975bae To run it, 
modify the metamath_exe variable inside the script to point at your local 
metamath installation and run it as "./preparejobs.sh N filename" where N 
is the number of CPUs and filename is the database name. For example, 
"./preparejobs.sh 24 set.mm". It will generate N files job1xx.cmd in the 
same directory with the format as in the min2018-jobs.zip archive.

My bash skills are far from perfect, so feel free to report on a bug in 
that script. I'm very intrigued to know if it will improve the full 
minimization run, so I hope it works well :-)

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/6742bd11-6cfb-4245-84fd-508a5ac3b5f9%40googlegroups.com.

[Metamath] Re: A formula for 'minimize' run time or how to schedule minimization

Reply via email to