On Wed, Apr 17, 2019 at 12:55 PM E. Madison Bray <[email protected]> wrote:
>
> On Wed, Apr 17, 2019 at 12:32 PM E. Madison Bray <[email protected]> 
> wrote:
> >
> > On Tue, Apr 16, 2019 at 2:21 PM E. Madison Bray <[email protected]> 
> > wrote:
> > >
> > > On Tue, Apr 16, 2019 at 2:11 PM Dima Pasechnik <[email protected]> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, 16 Apr 2019 14:04 E. Madison Bray, <[email protected]> 
> > > > wrote:
> > > >>
> > > >> Hi Daniel,
> > > >>
> > > >> GitLab work is still going and any help would be appreciated.  The
> > > >> biggest hurdle at the moment remains infrastructure.  We both lack the
> > > >> amount of necessary physical infrastructure to keep builds going, as
> > > >> well as the human infrastructure to regularly monitor the builds and
> > > >> address problems.
> > > >>
> > > >> In fact, since you brought it up, I just realized that the gitlab
> > > >> runner I'm administering has been broken for a couple weeks.
> > > >
> > > >
> > > >
> > > > it could be as it was using VMs  from my Google grant that expired, so 
> > > > I had to shut them down.
> > >
> > > How much was the grant for?  I thought we set that up like 2 months
> > > ago at the most.  Disconcerting that it expired that quickly...
> > >
> > >
> > > I don't know what happened with my openstack-based runner, but I
> > > kicked it off again and it's doing a build-from-clean of 8.8.beta2
> > > now: https://gitlab.com/sagemath/sage/-/jobs/197508827
> > >
> > > I'll need to look into what I can do to improve monitoring so that I
> > > become aware of any problems sooner...
>
> > I don't know what's going on.  Meanwhile the VM that ran this build is
> > still hanging around, and my gitlab-runner master server isn't
> > spawning a new one...
>
> Oops this is just me forgetting / misunderstanding how gitlab-runner
> works again.  I'm using the docker+machine executor [1]
> which runs docker-machine from the gitlab-runner master host to spawn
> VMs to host the actual builds in Docker containers.  It doesn't spawn
> a VM per-build; just a container.  This allows you to spawn build VMs
> as-needed, and destroy them when they go idle.
>
> Currently I have just IdleCount=1 so it only spawns one VM at a time
> anyways, and IdleTime=600 (10 minutes) which is never reached since
> currently my build runner is like, the only one dedicated to Sage
> that's actually working so it's always busy.  And indeed it's busy
> right now on another job.
>
> Unfortunately that still doesn't explain why the last build-from-clean
> job failed randomly.

Well it was hard to tell from the logs, but the main problem turned
out to be running out of disk space.  I don't know why--the worker
machines had 20GB each which you'd think should be enough, and that it
wasn't is something worth investigating in its own right.

I've bumped them all to a different VM flavor that has more disk
space--in that past I've had trouble provisioning machines with that
flavor but it seems there are a few available now so I've grabbed
them.

==========

As for the logs there are two problems:

* GitLab has a limit of how large the standard I/O log from a build
can get; on gitlab.org I think it's like 3MB or so.  So Julian's CI
scripts have a little hack that shows the beginning and end of each
log, but not the middle.  The full, unfiltered log can be downloaded
as a build artifact upon build failure.

However, it seems stderr is not being saved to the build artifact log,
and only shows up in the raw log on gitlab, often just randomly in the
middle of the dots it outputs during the middle of the build.  So it's
difficult to place the message in any kind of context.

The message in this case is odd too:

Error processing tar file(exit status 1): write
/home/sage/sage/src/build/lib.linux-x86_64-2.7/sage/matrix/matrix_integer_sparse.so:
no space left on device
Command exited with non-zero status 1
real 2h 8m 02s
user 0m 15.61s
sys 0m 5.60s


I don't understand why it's (apparently) trying to write a tar file
from the sagelib build. Normally, when installing into SAGE_LOCAL, it
just does a direct copy.  And indeed when I look at the full log in
the build artifacts it looks like sagelib finishes installing
successfully.  So I have no idea where this "processing tar file"
stuff is coming from...

Any ideas?

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to