Also when multiple pipelines are running about the same time - there are dependencies [with job scheduling] that results them in all taking longer time [improving this requires redundancy in every type of resource, linux, osx, bsd, and scheduling features to prioritize stage1 over stage2 over stage3 jobs etc.. scheduling so that stage-3 jobs don't block stage-2 jobs - but then - the preempted pipelines get delayed]
Satish On Sun, 11 Oct 2020, Satish Balay via petsc-dev wrote: > A few other related things to do: > > - add more stage1 jobs (and resources) that can catch failures early - > but don't increase stage-1 time. > > - improving stage-2 jobs time - this requires gitlab scheduler > features [higher priority jobs over others] > > and reducing the work currently done by some of the longer (stage-2) > jobs [this cost is dependent on if pkgs need rebuilding or not > etc..] > > Satish > > > On Sun, 11 Oct 2020, Satish Balay via petsc-dev wrote: > > > Well I don't think the download time is significant [for all the > > builds at ANL] - as compared to the build times. > > > > For ex: most of the time - petsc-pkg-hash gets reused [and this saves > > on both downloads and builds] - such builds take about 2h. But when > > packages have to be rebuilt - it can take 2:45 to 3h [so download part > > must be pretty small] > > > > But yeah - its wasted bandwidth - and not tolerant to network > > disruptions. > > > > And the other issue: might help with CI on low-bandwidth locations > > [say run a CI instance at my house on a spare laptop] > > > > But yes - this requires infrastructure. The way I look at it is - we > > need a "local mirror" or "cache" infrastructure. > > > > i.e keep the cache part separate from the build part [and not intertwine > > them] > > > > Spack does stuff in this direction [and also has remote cache as one > > of the 100 remote sites from where the packages can downloaded can be > > down - but its not tolerant to certain changes - so I have to > > periodically clean it - to have confidence in my build]. > > > > > > Note: If there is a git repo locally cached (and mirrored) - we don't > > have to deal with shallow clones. > > > > Might have a bigger impact if we can improve petsc-pkg-hash > > infrastructure to avoid rebuilds in more cases. [i.e make it more > > tolerant to configure changes - but its not clear to me - which > > changes wont require rebuilds] > > > > Satish > > > > > > On Sun, 11 Oct 2020, Barry Smith wrote: > > > > > > > > Satish, > > > > > > Do you think the time to download all the external packages for each > > > job is significant? > > > > > > Would using super shallow clones on the external packages help much in > > > time? Maybe we should to them anyways to stop wasting bandwidth? > > > Currently we do full clones? but we don't need the huge histories. > > > > > > A much more elaborate way to save more time > > > > > > On each test machine have repositories of all the external packages > > > > > > For each job, > > > > > > do pull in all these repositories from remote that job depends > > > on (usually this will get nothing so take no time) > > > > > > For each package either > > > > > > - build in a unique build directory of the repository > > > directory directly (for CMAKE and packages that support out of base > > > directory builds) > > > > > > - make a local shallow clone of the local copy of the > > > repository to externalpackages for the rest and do those builds there > > > > > > The average cost of this will just some shallow local clones > > > instead of copying over from remote machines. > > > The PETSc test directories can still be completely cleaned out > > > for each job so Satish need not worry about testing with dirty > > > directories. > > > > > > This requires a bit of infrastructure, if it saves a minute it is > > > not worth it, but if it cuts the pipeline time from 180 minutes to 150 > > > maybe? > > > Probably not worth it. Could also be done just for a couple of the > > > most external package intense jobs. > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >