Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Nicholas Krause




On 2/28/20 4:22 PM, Daniel Vetter wrote:

On Fri, Feb 28, 2020 at 9:31 PM Dave Airlie  wrote:

On Sat, 29 Feb 2020 at 05:34, Eric Anholt  wrote:

On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:

On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:

On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:

b) we probably need to take a large step back here.

Look at this from a sponsor POV, why would I give X.org/fd.o
sponsorship money that they are just giving straight to google to pay
for hosting credits? Google are profiting in some minor way from these
hosting credits being bought by us, and I assume we aren't getting any
sort of discounts here. Having google sponsor the credits costs google
substantially less than having any other company give us money to do
it.

The last I looked, Google GCP / Amazon AWS / Azure were all pretty
comparable in terms of what you get and what you pay for them.
Obviously providers like Packet and Digital Ocean who offer bare-metal
services are cheaper, but then you need to find someone who is going
to properly administer the various machines, install decent
monitoring, make sure that more storage is provisioned when we need
more storage (which is basically all the time), make sure that the
hardware is maintained in decent shape (pretty sure one of the fd.o
machines has had a drive in imminent-failure state for the last few
months), etc.

Given the size of our service, that's a much better plan (IMO) than
relying on someone who a) isn't an admin by trade, b) has a million
other things to do, and c) hasn't wanted to do it for the past several
years. But as long as that's the resources we have, then we're paying
the cloud tradeoff, where we pay more money in exchange for fewer
problems.

Admin for gitlab and CI is a full time role anyways. The system is
definitely not self sustaining without time being put in by you and
anholt still. If we have $75k to burn on credits, and it was diverted
to just pay an admin to admin the real hw + gitlab/CI would that not
be a better use of the money? I didn't know if we can afford $75k for
an admin, but suddenly we can afford it for gitlab credits?

As I think about the time that I've spent at google in less than a
year on trying to keep the lights on for CI and optimize our
infrastructure in the current cloud environment, that's more than the
entire yearly budget you're talking about here.  Saying "let's just
pay for people to do more work instead of paying for full-service
cloud" is not a cost optimization.



Yes, we could federate everything back out so everyone runs their own
builds and executes those. Tinderbox did something really similar to
that IIRC; not sure if Buildbot does as well. Probably rules out
pre-merge testing, mind.

Why? does gitlab not support the model? having builds done in parallel
on runners closer to the test runners seems like it should be a thing.
I guess artifact transfer would cost less then as a result.

Let's do some napkin math.  The biggest artifacts cost we have in Mesa
is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
makes ~1.8TB/month ($180 or so).  We could build a local storage next
to the lava dispatcher so that the artifacts didn't have to contain
the rootfs that came from the container (~2/3 of the insides of the
zip file), but that's another service to build and maintain.  Building
the drivers once locally and storing it would save downloading the
other ~1/3 of the inside of the zip file, but that requires a big
enough system to do builds in time.

I'm planning on doing a local filestore for google's lava lab, since I
need to be able to move our xml files off of the lava DUTs to get the
xml results we've become accustomed to, but this would not bubble up
to being a priority for my time if I wasn't doing it anyway.  If it
takes me a single day to set all this up (I estimate a couple of
weeks), that costs my employer a lot more than sponsoring the costs of
the inefficiencies of the system that has accumulated.

I'm not trying to knock the engineering works the CI contributors have
done at all, but I've never seen a real discussion about costs until
now. Engineers aren't accountants.

The thing we seem to be missing here is fiscal responsibility. I know
this email is us being fiscally responsible, but it's kinda after the
fact.

I cannot commit my employer to spending a large amount of money (> 0
actually) without a long and lengthy process with checks and bounds.
Can you?

The X.org board has budgets and procedures as well. I as a developer
of Mesa should not be able to commit the X.org foundation to spending
large amounts of money without checks and bounds.

The CI infrastructure lacks any checks and bounds. There is no link
between editing .gitlab-ci/* and cashflow. There is no link to me
adding support for a new feature to llvmpipe that blows out test times
(granted it won't affect CI budget but just an example).


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Vetter
On Fri, Feb 28, 2020 at 9:31 PM Dave Airlie  wrote:
>
> On Sat, 29 Feb 2020 at 05:34, Eric Anholt  wrote:
> >
> > On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:
> > >
> > > On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> > > >
> > > > On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > > > > b) we probably need to take a large step back here.
> > > > >
> > > > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > > > sponsorship money that they are just giving straight to google to pay
> > > > > for hosting credits? Google are profiting in some minor way from these
> > > > > hosting credits being bought by us, and I assume we aren't getting any
> > > > > sort of discounts here. Having google sponsor the credits costs google
> > > > > substantially less than having any other company give us money to do
> > > > > it.
> > > >
> > > > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > > > comparable in terms of what you get and what you pay for them.
> > > > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > > > services are cheaper, but then you need to find someone who is going
> > > > to properly administer the various machines, install decent
> > > > monitoring, make sure that more storage is provisioned when we need
> > > > more storage (which is basically all the time), make sure that the
> > > > hardware is maintained in decent shape (pretty sure one of the fd.o
> > > > machines has had a drive in imminent-failure state for the last few
> > > > months), etc.
> > > >
> > > > Given the size of our service, that's a much better plan (IMO) than
> > > > relying on someone who a) isn't an admin by trade, b) has a million
> > > > other things to do, and c) hasn't wanted to do it for the past several
> > > > years. But as long as that's the resources we have, then we're paying
> > > > the cloud tradeoff, where we pay more money in exchange for fewer
> > > > problems.
> > >
> > > Admin for gitlab and CI is a full time role anyways. The system is
> > > definitely not self sustaining without time being put in by you and
> > > anholt still. If we have $75k to burn on credits, and it was diverted
> > > to just pay an admin to admin the real hw + gitlab/CI would that not
> > > be a better use of the money? I didn't know if we can afford $75k for
> > > an admin, but suddenly we can afford it for gitlab credits?
> >
> > As I think about the time that I've spent at google in less than a
> > year on trying to keep the lights on for CI and optimize our
> > infrastructure in the current cloud environment, that's more than the
> > entire yearly budget you're talking about here.  Saying "let's just
> > pay for people to do more work instead of paying for full-service
> > cloud" is not a cost optimization.
> >
> >
> > > > Yes, we could federate everything back out so everyone runs their own
> > > > builds and executes those. Tinderbox did something really similar to
> > > > that IIRC; not sure if Buildbot does as well. Probably rules out
> > > > pre-merge testing, mind.
> > >
> > > Why? does gitlab not support the model? having builds done in parallel
> > > on runners closer to the test runners seems like it should be a thing.
> > > I guess artifact transfer would cost less then as a result.
> >
> > Let's do some napkin math.  The biggest artifacts cost we have in Mesa
> > is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
> > downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
> > makes ~1.8TB/month ($180 or so).  We could build a local storage next
> > to the lava dispatcher so that the artifacts didn't have to contain
> > the rootfs that came from the container (~2/3 of the insides of the
> > zip file), but that's another service to build and maintain.  Building
> > the drivers once locally and storing it would save downloading the
> > other ~1/3 of the inside of the zip file, but that requires a big
> > enough system to do builds in time.
> >
> > I'm planning on doing a local filestore for google's lava lab, since I
> > need to be able to move our xml files off of the lava DUTs to get the
> > xml results we've become accustomed to, but this would not bubble up
> > to being a priority for my time if I wasn't doing it anyway.  If it
> > takes me a single day to set all this up (I estimate a couple of
> > weeks), that costs my employer a lot more than sponsoring the costs of
> > the inefficiencies of the system that has accumulated.
>
> I'm not trying to knock the engineering works the CI contributors have
> done at all, but I've never seen a real discussion about costs until
> now. Engineers aren't accountants.
>
> The thing we seem to be missing here is fiscal responsibility. I know
> this email is us being fiscally responsible, but it's kinda after the
> fact.
>
> I cannot commit my employer to spending a large amount of money (> 0
> actually) without a long and lengthy process with checks and bounds.
> Can you?
>
> The 

Re: [Mesa-dev] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Matt Turner
On Fri, Feb 28, 2020 at 12:00 AM Daniel Stone  wrote:
>
> Hi Matt,
>
> On Thu, 27 Feb 2020 at 23:45, Matt Turner  wrote:
> > We're paying 75K USD for the bandwidth to transfer data from the
> > GitLab cloud instance. i.e., for viewing the https site, for
> > cloning/updating git repos, and for downloading CI artifacts/images to
> > the testing machines (AFAIU).
>
> I believe that in January, we had $2082 of network cost (almost
> entirely egress; ingress is basically free) and $1750 of cloud-storage
> cost (almost all of which was download). That's based on 16TB of
> cloud-storage (CI artifacts, container images, file uploads, Git LFS)
> egress and 17.9TB of other egress (the web service itself, repo
> activity). Projecting that out gives us roughly $45k of network
> activity alone, so it looks like this figure is based on a projected
> increase of ~50%.
>
> The actual compute capacity is closer to $1150/month.

Could we have the full GCP bill posted?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Dave Airlie
On Sat, 29 Feb 2020 at 05:34, Eric Anholt  wrote:
>
> On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:
> >
> > On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> > >
> > > On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > > > b) we probably need to take a large step back here.
> > > >
> > > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > > sponsorship money that they are just giving straight to google to pay
> > > > for hosting credits? Google are profiting in some minor way from these
> > > > hosting credits being bought by us, and I assume we aren't getting any
> > > > sort of discounts here. Having google sponsor the credits costs google
> > > > substantially less than having any other company give us money to do
> > > > it.
> > >
> > > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > > comparable in terms of what you get and what you pay for them.
> > > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > > services are cheaper, but then you need to find someone who is going
> > > to properly administer the various machines, install decent
> > > monitoring, make sure that more storage is provisioned when we need
> > > more storage (which is basically all the time), make sure that the
> > > hardware is maintained in decent shape (pretty sure one of the fd.o
> > > machines has had a drive in imminent-failure state for the last few
> > > months), etc.
> > >
> > > Given the size of our service, that's a much better plan (IMO) than
> > > relying on someone who a) isn't an admin by trade, b) has a million
> > > other things to do, and c) hasn't wanted to do it for the past several
> > > years. But as long as that's the resources we have, then we're paying
> > > the cloud tradeoff, where we pay more money in exchange for fewer
> > > problems.
> >
> > Admin for gitlab and CI is a full time role anyways. The system is
> > definitely not self sustaining without time being put in by you and
> > anholt still. If we have $75k to burn on credits, and it was diverted
> > to just pay an admin to admin the real hw + gitlab/CI would that not
> > be a better use of the money? I didn't know if we can afford $75k for
> > an admin, but suddenly we can afford it for gitlab credits?
>
> As I think about the time that I've spent at google in less than a
> year on trying to keep the lights on for CI and optimize our
> infrastructure in the current cloud environment, that's more than the
> entire yearly budget you're talking about here.  Saying "let's just
> pay for people to do more work instead of paying for full-service
> cloud" is not a cost optimization.
>
>
> > > Yes, we could federate everything back out so everyone runs their own
> > > builds and executes those. Tinderbox did something really similar to
> > > that IIRC; not sure if Buildbot does as well. Probably rules out
> > > pre-merge testing, mind.
> >
> > Why? does gitlab not support the model? having builds done in parallel
> > on runners closer to the test runners seems like it should be a thing.
> > I guess artifact transfer would cost less then as a result.
>
> Let's do some napkin math.  The biggest artifacts cost we have in Mesa
> is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
> downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
> makes ~1.8TB/month ($180 or so).  We could build a local storage next
> to the lava dispatcher so that the artifacts didn't have to contain
> the rootfs that came from the container (~2/3 of the insides of the
> zip file), but that's another service to build and maintain.  Building
> the drivers once locally and storing it would save downloading the
> other ~1/3 of the inside of the zip file, but that requires a big
> enough system to do builds in time.
>
> I'm planning on doing a local filestore for google's lava lab, since I
> need to be able to move our xml files off of the lava DUTs to get the
> xml results we've become accustomed to, but this would not bubble up
> to being a priority for my time if I wasn't doing it anyway.  If it
> takes me a single day to set all this up (I estimate a couple of
> weeks), that costs my employer a lot more than sponsoring the costs of
> the inefficiencies of the system that has accumulated.

I'm not trying to knock the engineering works the CI contributors have
done at all, but I've never seen a real discussion about costs until
now. Engineers aren't accountants.

The thing we seem to be missing here is fiscal responsibility. I know
this email is us being fiscally responsible, but it's kinda after the
fact.

I cannot commit my employer to spending a large amount of money (> 0
actually) without a long and lengthy process with checks and bounds.
Can you?

The X.org board has budgets and procedures as well. I as a developer
of Mesa should not be able to commit the X.org foundation to spending
large amounts of money without checks and bounds.

The CI infrastructure lacks any checks and 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Eric Anholt
On Fri, Feb 28, 2020 at 1:20 AM Eero Tamminen  wrote:
>
> Hi,
>
> On 28.2.2020 10.48, Dave Airlie wrote:
> >> Yes, we could federate everything back out so everyone runs their own
> >> builds and executes those. Tinderbox did something really similar to
> >> that IIRC; not sure if Buildbot does as well. Probably rules out
> >> pre-merge testing, mind.
> >
> > Why? does gitlab not support the model? having builds done in parallel
> > on runners closer to the test runners seems like it should be a thing.
> > I guess artifact transfer would cost less then as a result.
>
> Do container images being tested currently include sources, build
> dependencies or intermediate build results in addition to the installed
> binaries?
>
>
> Note: with e.g. Docker, removing those after build step doesn't reduce
> image size.
>
> You need either to do the removal in the same step, like this:
> 
> RUN git clone --branch ${TAG_LIBDRM} --depth 1 \
>  https://gitlab.freedesktop.org/mesa/drm.git && \
>  cd drm  &&  mkdir build  &&  cd build  && \
>  meson --prefix=${INSTALL_DIR} --libdir ${LIB_DIR} \
>-Dcairo-tests=false ../  && \
>  ninja  &&  ninja install  && \
>  cd ../..  &&  rm -r drm
> 
>
> Or use multi-stage container where the final stage installs just
> the run-time dependencies, and copies the final (potentially stripped)
> installed binaries from the build stage.

Our container builds already contain all the cleanup and don't do
multi-stage builds.  If you have concerns about container size, I
recommend picking a job using a container, pulling that container, and
going in and taking a look around instead of speculating.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Eric Anholt
On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:
>
> On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> >
> > On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > > b) we probably need to take a large step back here.
> > >
> > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > sponsorship money that they are just giving straight to google to pay
> > > for hosting credits? Google are profiting in some minor way from these
> > > hosting credits being bought by us, and I assume we aren't getting any
> > > sort of discounts here. Having google sponsor the credits costs google
> > > substantially less than having any other company give us money to do
> > > it.
> >
> > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > comparable in terms of what you get and what you pay for them.
> > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > services are cheaper, but then you need to find someone who is going
> > to properly administer the various machines, install decent
> > monitoring, make sure that more storage is provisioned when we need
> > more storage (which is basically all the time), make sure that the
> > hardware is maintained in decent shape (pretty sure one of the fd.o
> > machines has had a drive in imminent-failure state for the last few
> > months), etc.
> >
> > Given the size of our service, that's a much better plan (IMO) than
> > relying on someone who a) isn't an admin by trade, b) has a million
> > other things to do, and c) hasn't wanted to do it for the past several
> > years. But as long as that's the resources we have, then we're paying
> > the cloud tradeoff, where we pay more money in exchange for fewer
> > problems.
>
> Admin for gitlab and CI is a full time role anyways. The system is
> definitely not self sustaining without time being put in by you and
> anholt still. If we have $75k to burn on credits, and it was diverted
> to just pay an admin to admin the real hw + gitlab/CI would that not
> be a better use of the money? I didn't know if we can afford $75k for
> an admin, but suddenly we can afford it for gitlab credits?

As I think about the time that I've spent at google in less than a
year on trying to keep the lights on for CI and optimize our
infrastructure in the current cloud environment, that's more than the
entire yearly budget you're talking about here.  Saying "let's just
pay for people to do more work instead of paying for full-service
cloud" is not a cost optimization.


> > Yes, we could federate everything back out so everyone runs their own
> > builds and executes those. Tinderbox did something really similar to
> > that IIRC; not sure if Buildbot does as well. Probably rules out
> > pre-merge testing, mind.
>
> Why? does gitlab not support the model? having builds done in parallel
> on runners closer to the test runners seems like it should be a thing.
> I guess artifact transfer would cost less then as a result.

Let's do some napkin math.  The biggest artifacts cost we have in Mesa
is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
makes ~1.8TB/month ($180 or so).  We could build a local storage next
to the lava dispatcher so that the artifacts didn't have to contain
the rootfs that came from the container (~2/3 of the insides of the
zip file), but that's another service to build and maintain.  Building
the drivers once locally and storing it would save downloading the
other ~1/3 of the inside of the zip file, but that requires a big
enough system to do builds in time.

I'm planning on doing a local filestore for google's lava lab, since I
need to be able to move our xml files off of the lava DUTs to get the
xml results we've become accustomed to, but this would not bubble up
to being a priority for my time if I wasn't doing it anyway.  If it
takes me a single day to set all this up (I estimate a couple of
weeks), that costs my employer a lot more than sponsoring the costs of
the inefficiencies of the system that has accumulated.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Kristian Høgsberg
On Thu, Feb 27, 2020 at 7:38 PM Dave Airlie  wrote:
>
> On Fri, 28 Feb 2020 at 07:27, Daniel Vetter  wrote:
> >
> > Hi all,
> >
> > You might have read the short take in the X.org board meeting minutes
> > already, here's the long version.
> >
> > The good news: gitlab.fd.o has become very popular with our
> > communities, and is used extensively. This especially includes all the
> > CI integration. Modern development process and tooling, yay!
> >
> > The bad news: The cost in growth has also been tremendous, and it's
> > breaking our bank account. With reasonable estimates for continued
> > growth we're expecting hosting expenses totalling 75k USD this year,
> > and 90k USD next year. With the current sponsors we've set up we can't
> > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > without any of the CI features enabled would total 30k USD, which is
> > within X.org's ability to support through various sponsorships, mostly
> > through XDC.
> >
> > Note that X.org does no longer sponsor any CI runners themselves,
> > we've stopped that. The huge additional expenses are all just in
> > storing and serving build artifacts and images to outside CI runners
> > sponsored by various companies. A related topic is that with the
> > growth in fd.o it's becoming infeasible to maintain it all on
> > volunteer admin time. X.org is therefore also looking for admin
> > sponsorship, at least medium term.
> >
> > Assuming that we want cash flow reserves for one year of gitlab.fd.o
> > (without CI support) and a trimmed XDC and assuming no sponsor payment
> > meanwhile, we'd have to cut CI services somewhere between May and June
> > this year. The board is of course working on acquiring sponsors, but
> > filling a shortfall of this magnitude is neither easy nor quick work,
> > and we therefore decided to give an early warning as soon as possible.
> > Any help in finding sponsors for fd.o is very much appreciated.
>
> a) Ouch.
>
> b) we probably need to take a large step back here.

If we're taking a step back here, I also want to recognize what a
tremendous success this has been so far and thank everybody involved
for building something so useful. Between gitlab and the CI, our
workflow has improved and code quality has gone up.  I don't have
anything useful to add to the technical discussion, except that that
it seems pretty standard engineering practice to build a system,
observe it and identify and eliminate bottlenecks. Planning never
hurts, of course, but I don't think anybody could have realistically
modeled and projected the cost of this infrastructure as it's grown
organically and fast.

Kristian

> Look at this from a sponsor POV, why would I give X.org/fd.o
> sponsorship money that they are just giving straight to google to pay
> for hosting credits? Google are profiting in some minor way from these
> hosting credits being bought by us, and I assume we aren't getting any
> sort of discounts here. Having google sponsor the credits costs google
> substantially less than having any other company give us money to do
> it.
>
> If our current CI architecture is going to burn this amount of money a
> year and we hadn't worked this out in advance of deploying it then I
> suggest the system should be taken offline until we work out what a
> sustainable system would look like within the budget we have, whether
> that be never transferring containers and build artifacts from the
> google network, just having local runner/build combos etc.
>
> Dave.
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Rob Clark
On Fri, Feb 28, 2020 at 3:43 AM Michel Dänzer  wrote:
>
> On 2020-02-28 10:28 a.m., Erik Faye-Lund wrote:
> >
> > We could also do stuff like reducing the amount of tests we run on each
> > commit, and punt some testing to a per-weekend test-run or someting
> > like that. We don't *need* to know about every problem up front, just
> > the stuff that's about to be released, really. The other stuff is just
> > nice to have. If it's too expensive, I would say drop it.
>
> I don't agree that pre-merge testing is just nice to have. A problem
> which is only caught after it lands in mainline has a much bigger impact
> than one which is already caught earlier.
>

one thought.. since with mesa+margebot we effectively get at least
two(ish) CI runs per MR, ie. one when it is initially pushed, and one
when margebot rebases and tries to merge, could we leverage this to
have trimmed down pre-margebot CI which tries to just target affected
drivers, with margebot doing a full CI run (when it is potentially
batching together multiple MRs)?

Seems like a way to reduce our CI runs with a good safety net to
prevent things from slipping through the cracks.

(Not sure how much that would help reduce bandwidth costs, but I guess
it should help a bit.)

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 13:46, Michel Dänzer wrote:

On 2020-02-28 12:02 p.m., Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:

On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
 wrote:

On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:

Yeah, changes on vulkan drivers or backend compilers should be
fairly
sandboxed.

We also have tools that only work for intel stuff, that should
never
trigger anything on other people's HW.

Could something be worked out using the tags?

I think so! We have the pre-defined environment variable
CI_MERGE_REQUEST_LABELS, and we can do variable conditions:

https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables

That sounds like a pretty neat middle-ground to me. I just hope
that
new pipelines are triggered if new labels are added, because not
everyone is allowed to set labels, and sometimes people forget...

There's also this which is somewhat more robust:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569

I'm not sure it's more robust, but yeah that a useful tool too.

The reason I'm skeptical about the robustness is that we'll miss
testing if this misses a path.

Surely missing a path will be less likely / often to happen compared to
an MR missing a label. (Users which aren't members of the project can't
even set labels for an MR)



Sounds like a good alternative to tags.


-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Michel Dänzer
On 2020-02-28 12:02 p.m., Erik Faye-Lund wrote:
> On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:
>> On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
>>  wrote:
>>> On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
 Yeah, changes on vulkan drivers or backend compilers should be
 fairly
 sandboxed.

 We also have tools that only work for intel stuff, that should
 never
 trigger anything on other people's HW.

 Could something be worked out using the tags?
>>>
>>> I think so! We have the pre-defined environment variable
>>> CI_MERGE_REQUEST_LABELS, and we can do variable conditions:
>>>
>>> https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables
>>>
>>> That sounds like a pretty neat middle-ground to me. I just hope
>>> that
>>> new pipelines are triggered if new labels are added, because not
>>> everyone is allowed to set labels, and sometimes people forget...
>>
>> There's also this which is somewhat more robust:
>> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569
> 
> I'm not sure it's more robust, but yeah that a useful tool too.
> 
> The reason I'm skeptical about the robustness is that we'll miss
> testing if this misses a path.

Surely missing a path will be less likely / often to happen compared to
an MR missing a label. (Users which aren't members of the project can't
even set labels for an MR)


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Michel Dänzer
On 2020-02-28 10:28 a.m., Erik Faye-Lund wrote:
> 
> We could also do stuff like reducing the amount of tests we run on each
> commit, and punt some testing to a per-weekend test-run or someting
> like that. We don't *need* to know about every problem up front, just
> the stuff that's about to be released, really. The other stuff is just
> nice to have. If it's too expensive, I would say drop it.

I don't agree that pre-merge testing is just nice to have. A problem
which is only caught after it lands in mainline has a much bigger impact
than one which is already caught earlier.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
Hi Jan,

On Fri, 28 Feb 2020 at 10:09, Jan Engelhardt  wrote:
> On Friday 2020-02-28 08:59, Daniel Stone wrote:
> >I believe that in January, we had $2082 of network cost (almost
> >entirely egress; ingress is basically free) and $1750 of
> >cloud-storage cost (almost all of which was download). That's based
> >on 16TB of cloud-storage (CI artifacts, container images, file
> >uploads, Git LFS) egress and 17.9TB of other egress (the web service
> >itself, repo activity). Projecting that out [×12 for a year] gives
> >us roughly $45k of network activity alone,
>
> I had come to a similar conclusion a few years back: It is not very
> economic to run ephemereal buildroots (and anything like it) between
> two (or more) "significant locations" of which one end is located in
> a Large Cloud datacenter like EC2/AWS/etc.
>
> As for such usecases, me and my surrounding peers have used (other)
> offerings where there is 50 TB free network/month, and yes that may
> have entailed doing more adminning than elsewhere - but an admin
> appreciates $2000 a lot more than a corporation, too.

Yes, absolutely. For context, our storage & network costs have
increased >10x in the past 12 months (~$320 Jan 2019), >3x in the past
6 months (~$1350 July 2019), and ~2x in the past 3 months (~$2000 Oct
2019).

I do now (personally) think that it's crossed the point at which it
would be worthwhile paying an admin to solve the problems that cloud
services currently solve for us - which wasn't true before. Such an
admin could also deal with things like our SMTP delivery failure rate,
which in the past year has spiked over 50% (see previous email),
demand for new services such as Discourse which will enable user
support without either a) users having to subscribe to a mailing list,
or b) bug trackers being cluttered up with user requests and other
non-bugs, etc.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:
> On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
>  wrote:
> > On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
> > > Yeah, changes on vulkan drivers or backend compilers should be
> > > fairly
> > > sandboxed.
> > > 
> > > We also have tools that only work for intel stuff, that should
> > > never
> > > trigger anything on other people's HW.
> > > 
> > > Could something be worked out using the tags?
> > 
> > I think so! We have the pre-defined environment variable
> > CI_MERGE_REQUEST_LABELS, and we can do variable conditions:
> > 
> > https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables
> > 
> > That sounds like a pretty neat middle-ground to me. I just hope
> > that
> > new pipelines are triggered if new labels are added, because not
> > everyone is allowed to set labels, and sometimes people forget...
> 
> There's also this which is somewhat more robust:
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569
> 
> 

I'm not sure it's more robust, but yeah that a useful tool too.

The reason I'm skeptical about the robustness is that we'll miss
testing if this misses a path. That can of course be fixed by testing
everything once things are in master, and fixing up that list when
something breaks on master.

The person who wrote a change knows more about the intricacies of the
changes than a computer will ever do. But humans are also good at
making mistakes, so I'm not sure which one is better. Maybe the union
of both?

As long as we have both rigorous testing after something landed in
master (doesn't nessecarily need to happen right after, but for now
that's probably fine), as well as a reasonable heuristic for what
testing is needed pre-merge, I think we're good.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lucas Stach
On Fr, 2020-02-28 at 10:47 +0100, Daniel Vetter wrote:
> On Fri, Feb 28, 2020 at 10:29 AM Erik Faye-Lund
>  wrote:
> > On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
> > > wrote:
> > > > Hi all,
> > > > 
> > > > You might have read the short take in the X.org board meeting
> > > > minutes
> > > > already, here's the long version.
> > > > 
> > > > The good news: gitlab.fd.o has become very popular with our
> > > > communities, and is used extensively. This especially includes all
> > > > the
> > > > CI integration. Modern development process and tooling, yay!
> > > > 
> > > > The bad news: The cost in growth has also been tremendous, and it's
> > > > breaking our bank account. With reasonable estimates for continued
> > > > growth we're expecting hosting expenses totalling 75k USD this
> > > > year,
> > > > and 90k USD next year. With the current sponsors we've set up we
> > > > can't
> > > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > > without any of the CI features enabled would total 30k USD, which
> > > > is
> > > > within X.org's ability to support through various sponsorships,
> > > > mostly
> > > > through XDC.
> > > > 
> > > > Note that X.org does no longer sponsor any CI runners themselves,
> > > > we've stopped that. The huge additional expenses are all just in
> > > > storing and serving build artifacts and images to outside CI
> > > > runners
> > > > sponsored by various companies. A related topic is that with the
> > > > growth in fd.o it's becoming infeasible to maintain it all on
> > > > volunteer admin time. X.org is therefore also looking for admin
> > > > sponsorship, at least medium term.
> > > > 
> > > > Assuming that we want cash flow reserves for one year of
> > > > gitlab.fd.o
> > > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > > payment
> > > > meanwhile, we'd have to cut CI services somewhere between May and
> > > > June
> > > > this year. The board is of course working on acquiring sponsors,
> > > > but
> > > > filling a shortfall of this magnitude is neither easy nor quick
> > > > work,
> > > > and we therefore decided to give an early warning as soon as
> > > > possible.
> > > > Any help in finding sponsors for fd.o is very much appreciated.
> > > 
> > > a) Ouch.
> > > 
> > > b) we probably need to take a large step back here.
> > > 
> > 
> > I kinda agree, but maybe the step doesn't have to be *too* large?
> > 
> > I wonder if we could solve this by restructuring the project a bit. I'm
> > talking purely from a Mesa point of view here, so it might not solve
> > the full problem, but:
> > 
> > 1. It feels silly that we need to test changes to e.g the i965 driver
> > on dragonboards. We only have a big "do not run CI at all" escape-
> > hatch.
> > 
> > 2. A lot of us are working for a company that can probably pay for
> > their own needs in terms of CI. Perhaps moving some costs "up front" to
> > the company that needs it can make the future of CI for those who can't
> > do this
> > 
> > 3. I think we need a much more detailed break-down of the cost to make
> > educated changes. For instance, how expensive is Docker image
> > uploads/downloads (e.g intermediary artifacts) compared to build logs
> > and final test-results? What kind of artifacts?
> 
> We have logs somewhere, but no one yet got around to analyzing that.
> Which will be quite a bit of work to do since the cloud storage is
> totally disconnected from the gitlab front-end, making the connection
> to which project or CI job caused something is going to require
> scripting. Volunteers definitely very much welcome I think.

It's very surprising to me that this kind of cost monitoring is treated
as an afterthought, especially since one of the main jobs of the X.Org
board is to keep spending under control and transparent.

Also from all the conversations it's still unclear to me if the google
hosting costs are already over the sponsored credits (so is burning a
hole into X.org bank account right now) or if this is only going to
happen at a later point in time.

Even with CI disabled it seems that the board estimates a cost of 30k
annually for the plain gitlab hosting. Is this covered by the credits
sponsored by google? If not, why wasn't there a board voting on this
spending? All other spending seem to require pre-approval by the board.
Why wasn't gitlab hosting cost discussed much earlier in the public
board meetings, especially if it's going to be such an big chunk of the
overall spending of the X.Org foundation?

Regards,
Lucas

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
 wrote:
> On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
> > Yeah, changes on vulkan drivers or backend compilers should be
> > fairly
> > sandboxed.
> >
> > We also have tools that only work for intel stuff, that should never
> > trigger anything on other people's HW.
> >
> > Could something be worked out using the tags?
>
> I think so! We have the pre-defined environment variable
> CI_MERGE_REQUEST_LABELS, and we can do variable conditions:
>
> https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables
>
> That sounds like a pretty neat middle-ground to me. I just hope that
> new pipelines are triggered if new labels are added, because not
> everyone is allowed to set labels, and sometimes people forget...

There's also this which is somewhat more robust:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 10:47 +0100, Daniel Vetter wrote:
> On Fri, Feb 28, 2020 at 10:29 AM Erik Faye-Lund
>  wrote:
> > On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter <
> > > daniel.vet...@ffwll.ch>
> > > wrote:
> > > > Hi all,
> > > > 
> > > > You might have read the short take in the X.org board meeting
> > > > minutes
> > > > already, here's the long version.
> > > > 
> > > > The good news: gitlab.fd.o has become very popular with our
> > > > communities, and is used extensively. This especially includes
> > > > all
> > > > the
> > > > CI integration. Modern development process and tooling, yay!
> > > > 
> > > > The bad news: The cost in growth has also been tremendous, and
> > > > it's
> > > > breaking our bank account. With reasonable estimates for
> > > > continued
> > > > growth we're expecting hosting expenses totalling 75k USD this
> > > > year,
> > > > and 90k USD next year. With the current sponsors we've set up
> > > > we
> > > > can't
> > > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > > without any of the CI features enabled would total 30k USD,
> > > > which
> > > > is
> > > > within X.org's ability to support through various sponsorships,
> > > > mostly
> > > > through XDC.
> > > > 
> > > > Note that X.org does no longer sponsor any CI runners
> > > > themselves,
> > > > we've stopped that. The huge additional expenses are all just
> > > > in
> > > > storing and serving build artifacts and images to outside CI
> > > > runners
> > > > sponsored by various companies. A related topic is that with
> > > > the
> > > > growth in fd.o it's becoming infeasible to maintain it all on
> > > > volunteer admin time. X.org is therefore also looking for admin
> > > > sponsorship, at least medium term.
> > > > 
> > > > Assuming that we want cash flow reserves for one year of
> > > > gitlab.fd.o
> > > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > > payment
> > > > meanwhile, we'd have to cut CI services somewhere between May
> > > > and
> > > > June
> > > > this year. The board is of course working on acquiring
> > > > sponsors,
> > > > but
> > > > filling a shortfall of this magnitude is neither easy nor quick
> > > > work,
> > > > and we therefore decided to give an early warning as soon as
> > > > possible.
> > > > Any help in finding sponsors for fd.o is very much appreciated.
> > > 
> > > a) Ouch.
> > > 
> > > b) we probably need to take a large step back here.
> > > 
> > 
> > I kinda agree, but maybe the step doesn't have to be *too* large?
> > 
> > I wonder if we could solve this by restructuring the project a bit.
> > I'm
> > talking purely from a Mesa point of view here, so it might not
> > solve
> > the full problem, but:
> > 
> > 1. It feels silly that we need to test changes to e.g the i965
> > driver
> > on dragonboards. We only have a big "do not run CI at all" escape-
> > hatch.
> > 
> > 2. A lot of us are working for a company that can probably pay for
> > their own needs in terms of CI. Perhaps moving some costs "up
> > front" to
> > the company that needs it can make the future of CI for those who
> > can't
> > do this
> > 
> > 3. I think we need a much more detailed break-down of the cost to
> > make
> > educated changes. For instance, how expensive is Docker image
> > uploads/downloads (e.g intermediary artifacts) compared to build
> > logs
> > and final test-results? What kind of artifacts?
> 
> We have logs somewhere, but no one yet got around to analyzing that.
> Which will be quite a bit of work to do since the cloud storage is
> totally disconnected from the gitlab front-end, making the connection
> to which project or CI job caused something is going to require
> scripting. Volunteers definitely very much welcome I think.
> 

Fair enough, but just keep in mind that the same thing as optimizing
software applies here; working blindly reduces the impact. So if we
want to fix the current situation, this is going to have to be a
priority, I think.

> > One suggestion would be to do something more similar to what the
> > kernel
> > does, and separate into different repos for different subsystems.
> > This
> > could allow us to have separate testing-pipelines for these repos,
> > which would mean that for instance a change to RADV didn't trigger
> > a
> > full Panfrost test-run.
> 
> Uh as someone who lives the kernel multi-tree model daily, there's a
> _lot_ of pain involved.

Could you please elaborate a bit? We're not the kernel, so I'm not sure
all of the kernel-pains apply to us. But we probably have other pains
as well ;-)

But yeah, it might be better to take smaller steps first, and see if
that helps.

> I think much better to look at filtering out
> CI targets for when nothing relevant happened. But that gets somewhat
> tricky, since "nothing relevant" is always only relative to some
> baseline, so bit of scripting and all involved to make sure you don't
> run stuff too 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
> On 28/02/2020 11:28, Erik Faye-Lund wrote:
> > On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter <
> > > daniel.vet...@ffwll.ch>
> > > wrote:
> > > > Hi all,
> > > > 
> > > > You might have read the short take in the X.org board meeting
> > > > minutes
> > > > already, here's the long version.
> > > > 
> > > > The good news: gitlab.fd.o has become very popular with our
> > > > communities, and is used extensively. This especially includes
> > > > all
> > > > the
> > > > CI integration. Modern development process and tooling, yay!
> > > > 
> > > > The bad news: The cost in growth has also been tremendous, and
> > > > it's
> > > > breaking our bank account. With reasonable estimates for
> > > > continued
> > > > growth we're expecting hosting expenses totalling 75k USD this
> > > > year,
> > > > and 90k USD next year. With the current sponsors we've set up
> > > > we
> > > > can't
> > > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > > without any of the CI features enabled would total 30k USD,
> > > > which
> > > > is
> > > > within X.org's ability to support through various sponsorships,
> > > > mostly
> > > > through XDC.
> > > > 
> > > > Note that X.org does no longer sponsor any CI runners
> > > > themselves,
> > > > we've stopped that. The huge additional expenses are all just
> > > > in
> > > > storing and serving build artifacts and images to outside CI
> > > > runners
> > > > sponsored by various companies. A related topic is that with
> > > > the
> > > > growth in fd.o it's becoming infeasible to maintain it all on
> > > > volunteer admin time. X.org is therefore also looking for admin
> > > > sponsorship, at least medium term.
> > > > 
> > > > Assuming that we want cash flow reserves for one year of
> > > > gitlab.fd.o
> > > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > > payment
> > > > meanwhile, we'd have to cut CI services somewhere between May
> > > > and
> > > > June
> > > > this year. The board is of course working on acquiring
> > > > sponsors,
> > > > but
> > > > filling a shortfall of this magnitude is neither easy nor quick
> > > > work,
> > > > and we therefore decided to give an early warning as soon as
> > > > possible.
> > > > Any help in finding sponsors for fd.o is very much appreciated.
> > > a) Ouch.
> > > 
> > > b) we probably need to take a large step back here.
> > > 
> > I kinda agree, but maybe the step doesn't have to be *too* large?
> > 
> > I wonder if we could solve this by restructuring the project a bit.
> > I'm
> > talking purely from a Mesa point of view here, so it might not
> > solve
> > the full problem, but:
> > 
> > 1. It feels silly that we need to test changes to e.g the i965
> > driver
> > on dragonboards. We only have a big "do not run CI at all" escape-
> > hatch.
> 
> Yeah, changes on vulkan drivers or backend compilers should be
> fairly 
> sandboxed.
> 
> We also have tools that only work for intel stuff, that should never 
> trigger anything on other people's HW.
> 
> Could something be worked out using the tags?
> 

I think so! We have the pre-defined environment variable
CI_MERGE_REQUEST_LABELS, and we can do variable conditions:

https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables

That sounds like a pretty neat middle-ground to me. I just hope that
new pipelines are triggered if new labels are added, because not
everyone is allowed to set labels, and sometimes people forget...

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Vetter
On Fri, Feb 28, 2020 at 10:29 AM Erik Faye-Lund
 wrote:
>
> On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
> > wrote:
> > > Hi all,
> > >
> > > You might have read the short take in the X.org board meeting
> > > minutes
> > > already, here's the long version.
> > >
> > > The good news: gitlab.fd.o has become very popular with our
> > > communities, and is used extensively. This especially includes all
> > > the
> > > CI integration. Modern development process and tooling, yay!
> > >
> > > The bad news: The cost in growth has also been tremendous, and it's
> > > breaking our bank account. With reasonable estimates for continued
> > > growth we're expecting hosting expenses totalling 75k USD this
> > > year,
> > > and 90k USD next year. With the current sponsors we've set up we
> > > can't
> > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > without any of the CI features enabled would total 30k USD, which
> > > is
> > > within X.org's ability to support through various sponsorships,
> > > mostly
> > > through XDC.
> > >
> > > Note that X.org does no longer sponsor any CI runners themselves,
> > > we've stopped that. The huge additional expenses are all just in
> > > storing and serving build artifacts and images to outside CI
> > > runners
> > > sponsored by various companies. A related topic is that with the
> > > growth in fd.o it's becoming infeasible to maintain it all on
> > > volunteer admin time. X.org is therefore also looking for admin
> > > sponsorship, at least medium term.
> > >
> > > Assuming that we want cash flow reserves for one year of
> > > gitlab.fd.o
> > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > payment
> > > meanwhile, we'd have to cut CI services somewhere between May and
> > > June
> > > this year. The board is of course working on acquiring sponsors,
> > > but
> > > filling a shortfall of this magnitude is neither easy nor quick
> > > work,
> > > and we therefore decided to give an early warning as soon as
> > > possible.
> > > Any help in finding sponsors for fd.o is very much appreciated.
> >
> > a) Ouch.
> >
> > b) we probably need to take a large step back here.
> >
>
> I kinda agree, but maybe the step doesn't have to be *too* large?
>
> I wonder if we could solve this by restructuring the project a bit. I'm
> talking purely from a Mesa point of view here, so it might not solve
> the full problem, but:
>
> 1. It feels silly that we need to test changes to e.g the i965 driver
> on dragonboards. We only have a big "do not run CI at all" escape-
> hatch.
>
> 2. A lot of us are working for a company that can probably pay for
> their own needs in terms of CI. Perhaps moving some costs "up front" to
> the company that needs it can make the future of CI for those who can't
> do this
>
> 3. I think we need a much more detailed break-down of the cost to make
> educated changes. For instance, how expensive is Docker image
> uploads/downloads (e.g intermediary artifacts) compared to build logs
> and final test-results? What kind of artifacts?

We have logs somewhere, but no one yet got around to analyzing that.
Which will be quite a bit of work to do since the cloud storage is
totally disconnected from the gitlab front-end, making the connection
to which project or CI job caused something is going to require
scripting. Volunteers definitely very much welcome I think.

> One suggestion would be to do something more similar to what the kernel
> does, and separate into different repos for different subsystems. This
> could allow us to have separate testing-pipelines for these repos,
> which would mean that for instance a change to RADV didn't trigger a
> full Panfrost test-run.

Uh as someone who lives the kernel multi-tree model daily, there's a
_lot_ of pain involved. I think much better to look at filtering out
CI targets for when nothing relevant happened. But that gets somewhat
tricky, since "nothing relevant" is always only relative to some
baseline, so bit of scripting and all involved to make sure you don't
run stuff too often or (probably worse) not often enough.
-Daniel

> This would probably require us to accept using a more branch-heavy
> work-flow. I don't personally think that would be a bad thing.
>
> But this is all kinda based on an assumption that running hardware-
> testing is the expensive part. I think that's quite possibly the case,
> but some more numbers would be helpful. I mean, it might turn out that
> just throwing up a Docker cache inside the organizations that host
> runners might be sufficient for all I know...
>
> We could also do stuff like reducing the amount of tests we run on each
> commit, and punt some testing to a per-weekend test-run or someting
> like that. We don't *need* to know about every problem up front, just
> the stuff that's about to be released, really. The other stuff is just
> nice to have. If it's too expensive, I would say drop it.

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 11:28, Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:

On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
wrote:

Hi all,

You might have read the short take in the X.org board meeting
minutes
already, here's the long version.

The good news: gitlab.fd.o has become very popular with our
communities, and is used extensively. This especially includes all
the
CI integration. Modern development process and tooling, yay!

The bad news: The cost in growth has also been tremendous, and it's
breaking our bank account. With reasonable estimates for continued
growth we're expecting hosting expenses totalling 75k USD this
year,
and 90k USD next year. With the current sponsors we've set up we
can't
sustain that. We estimate that hosting expenses for gitlab.fd.o
without any of the CI features enabled would total 30k USD, which
is
within X.org's ability to support through various sponsorships,
mostly
through XDC.

Note that X.org does no longer sponsor any CI runners themselves,
we've stopped that. The huge additional expenses are all just in
storing and serving build artifacts and images to outside CI
runners
sponsored by various companies. A related topic is that with the
growth in fd.o it's becoming infeasible to maintain it all on
volunteer admin time. X.org is therefore also looking for admin
sponsorship, at least medium term.

Assuming that we want cash flow reserves for one year of
gitlab.fd.o
(without CI support) and a trimmed XDC and assuming no sponsor
payment
meanwhile, we'd have to cut CI services somewhere between May and
June
this year. The board is of course working on acquiring sponsors,
but
filling a shortfall of this magnitude is neither easy nor quick
work,
and we therefore decided to give an early warning as soon as
possible.
Any help in finding sponsors for fd.o is very much appreciated.

a) Ouch.

b) we probably need to take a large step back here.


I kinda agree, but maybe the step doesn't have to be *too* large?

I wonder if we could solve this by restructuring the project a bit. I'm
talking purely from a Mesa point of view here, so it might not solve
the full problem, but:

1. It feels silly that we need to test changes to e.g the i965 driver
on dragonboards. We only have a big "do not run CI at all" escape-
hatch.



Yeah, changes on vulkan drivers or backend compilers should be fairly 
sandboxed.


We also have tools that only work for intel stuff, that should never 
trigger anything on other people's HW.


Could something be worked out using the tags?


-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
> wrote:
> > Hi all,
> > 
> > You might have read the short take in the X.org board meeting
> > minutes
> > already, here's the long version.
> > 
> > The good news: gitlab.fd.o has become very popular with our
> > communities, and is used extensively. This especially includes all
> > the
> > CI integration. Modern development process and tooling, yay!
> > 
> > The bad news: The cost in growth has also been tremendous, and it's
> > breaking our bank account. With reasonable estimates for continued
> > growth we're expecting hosting expenses totalling 75k USD this
> > year,
> > and 90k USD next year. With the current sponsors we've set up we
> > can't
> > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > without any of the CI features enabled would total 30k USD, which
> > is
> > within X.org's ability to support through various sponsorships,
> > mostly
> > through XDC.
> > 
> > Note that X.org does no longer sponsor any CI runners themselves,
> > we've stopped that. The huge additional expenses are all just in
> > storing and serving build artifacts and images to outside CI
> > runners
> > sponsored by various companies. A related topic is that with the
> > growth in fd.o it's becoming infeasible to maintain it all on
> > volunteer admin time. X.org is therefore also looking for admin
> > sponsorship, at least medium term.
> > 
> > Assuming that we want cash flow reserves for one year of
> > gitlab.fd.o
> > (without CI support) and a trimmed XDC and assuming no sponsor
> > payment
> > meanwhile, we'd have to cut CI services somewhere between May and
> > June
> > this year. The board is of course working on acquiring sponsors,
> > but
> > filling a shortfall of this magnitude is neither easy nor quick
> > work,
> > and we therefore decided to give an early warning as soon as
> > possible.
> > Any help in finding sponsors for fd.o is very much appreciated.
> 
> a) Ouch.
> 
> b) we probably need to take a large step back here.
> 

I kinda agree, but maybe the step doesn't have to be *too* large?

I wonder if we could solve this by restructuring the project a bit. I'm
talking purely from a Mesa point of view here, so it might not solve
the full problem, but:

1. It feels silly that we need to test changes to e.g the i965 driver
on dragonboards. We only have a big "do not run CI at all" escape-
hatch.

2. A lot of us are working for a company that can probably pay for
their own needs in terms of CI. Perhaps moving some costs "up front" to
the company that needs it can make the future of CI for those who can't
do this 

3. I think we need a much more detailed break-down of the cost to make
educated changes. For instance, how expensive is Docker image
uploads/downloads (e.g intermediary artifacts) compared to build logs
and final test-results? What kind of artifacts?

One suggestion would be to do something more similar to what the kernel
does, and separate into different repos for different subsystems. This
could allow us to have separate testing-pipelines for these repos,
which would mean that for instance a change to RADV didn't trigger a
full Panfrost test-run.

This would probably require us to accept using a more branch-heavy
work-flow. I don't personally think that would be a bad thing.

But this is all kinda based on an assumption that running hardware-
testing is the expensive part. I think that's quite possibly the case,
but some more numbers would be helpful. I mean, it might turn out that
just throwing up a Docker cache inside the organizations that host
runners might be sufficient for all I know...

We could also do stuff like reducing the amount of tests we run on each
commit, and punt some testing to a per-weekend test-run or someting
like that. We don't *need* to know about every problem up front, just
the stuff that's about to be released, really. The other stuff is just
nice to have. If it's too expensive, I would say drop it.

I would really hope that we can consider approaches like this before we
throw out the baby with the bathwater...

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
On Fri, 28 Feb 2020 at 08:48, Dave Airlie  wrote:
> On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > comparable in terms of what you get and what you pay for them.
> > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > services are cheaper, but then you need to find someone who is going
> > to properly administer the various machines, install decent
> > monitoring, make sure that more storage is provisioned when we need
> > more storage (which is basically all the time), make sure that the
> > hardware is maintained in decent shape (pretty sure one of the fd.o
> > machines has had a drive in imminent-failure state for the last few
> > months), etc.
> >
> > Given the size of our service, that's a much better plan (IMO) than
> > relying on someone who a) isn't an admin by trade, b) has a million
> > other things to do, and c) hasn't wanted to do it for the past several
> > years. But as long as that's the resources we have, then we're paying
> > the cloud tradeoff, where we pay more money in exchange for fewer
> > problems.
>
> Admin for gitlab and CI is a full time role anyways. The system is
> definitely not self sustaining without time being put in by you and
> anholt still. If we have $75k to burn on credits, and it was diverted
> to just pay an admin to admin the real hw + gitlab/CI would that not
> be a better use of the money? I didn't know if we can afford $75k for
> an admin, but suddenly we can afford it for gitlab credits?

s/gitlab credits/GCP credits/

I took a quick look at HPE, which we previously used for bare metal,
and it looks like we'd be spending $25-50k (depending on how much
storage you want to provision, how much room you want to leave to
provision more storage later, how much you care about backups) to run
a similar level of service so that'd put a bit of a dint in your
year-one budget.

The bare-metal hosting providers also add up to more expensive than
you might think, again especially if you want either redundancy or
just backups.

> > Yes, we could federate everything back out so everyone runs their own
> > builds and executes those. Tinderbox did something really similar to
> > that IIRC; not sure if Buildbot does as well. Probably rules out
> > pre-merge testing, mind.
>
> Why? does gitlab not support the model? having builds done in parallel
> on runners closer to the test runners seems like it should be a thing.
> I guess artifact transfer would cost less then as a result.

It does support the model but if every single build executor is also
compiling Mesa from scratch locally, how long do you think that's
going to take?

> > Again, if you want everything to be centrally
> > designed/approved/monitored/controlled, that's a fine enough idea, and
> > I'd be happy to support whoever it was who was doing that for all of
> > fd.o.
>
> I don't think we have any choice but to have someone centrally
> controlling it, You can't have a system in place that lets CI users
> burn largs sums of money without authorisation, and that is what we
> have now.

OK, not sure who it is who's going to be approving every update to
every .gitlab-ci.yml in the repository, or maybe we just have zero
shared runners and anyone who wants to do builds can BYO.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Eero Tamminen

Hi,

On 28.2.2020 10.48, Dave Airlie wrote:

Yes, we could federate everything back out so everyone runs their own
builds and executes those. Tinderbox did something really similar to
that IIRC; not sure if Buildbot does as well. Probably rules out
pre-merge testing, mind.


Why? does gitlab not support the model? having builds done in parallel
on runners closer to the test runners seems like it should be a thing.
I guess artifact transfer would cost less then as a result.


Do container images being tested currently include sources, build 
dependencies or intermediate build results in addition to the installed 
binaries?



Note: with e.g. Docker, removing those after build step doesn't reduce 
image size.


You need either to do the removal in the same step, like this:

RUN git clone --branch ${TAG_LIBDRM} --depth 1 \
https://gitlab.freedesktop.org/mesa/drm.git && \
cd drm  &&  mkdir build  &&  cd build  && \
meson --prefix=${INSTALL_DIR} --libdir ${LIB_DIR} \
  -Dcairo-tests=false ../  && \
ninja  &&  ninja install  && \
cd ../..  &&  rm -r drm


Or use multi-stage container where the final stage installs just
the run-time dependencies, and copies the final (potentially stripped)
installed binaries from the build stage.


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Dave Airlie
On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
>
> On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > b) we probably need to take a large step back here.
> >
> > Look at this from a sponsor POV, why would I give X.org/fd.o
> > sponsorship money that they are just giving straight to google to pay
> > for hosting credits? Google are profiting in some minor way from these
> > hosting credits being bought by us, and I assume we aren't getting any
> > sort of discounts here. Having google sponsor the credits costs google
> > substantially less than having any other company give us money to do
> > it.
>
> The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> comparable in terms of what you get and what you pay for them.
> Obviously providers like Packet and Digital Ocean who offer bare-metal
> services are cheaper, but then you need to find someone who is going
> to properly administer the various machines, install decent
> monitoring, make sure that more storage is provisioned when we need
> more storage (which is basically all the time), make sure that the
> hardware is maintained in decent shape (pretty sure one of the fd.o
> machines has had a drive in imminent-failure state for the last few
> months), etc.
>
> Given the size of our service, that's a much better plan (IMO) than
> relying on someone who a) isn't an admin by trade, b) has a million
> other things to do, and c) hasn't wanted to do it for the past several
> years. But as long as that's the resources we have, then we're paying
> the cloud tradeoff, where we pay more money in exchange for fewer
> problems.

Admin for gitlab and CI is a full time role anyways. The system is
definitely not self sustaining without time being put in by you and
anholt still. If we have $75k to burn on credits, and it was diverted
to just pay an admin to admin the real hw + gitlab/CI would that not
be a better use of the money? I didn't know if we can afford $75k for
an admin, but suddenly we can afford it for gitlab credits?

> Yes, we could federate everything back out so everyone runs their own
> builds and executes those. Tinderbox did something really similar to
> that IIRC; not sure if Buildbot does as well. Probably rules out
> pre-merge testing, mind.

Why? does gitlab not support the model? having builds done in parallel
on runners closer to the test runners seems like it should be a thing.
I guess artifact transfer would cost less then as a result.

> The reason we hadn't worked everything out in advance of deploying is
> because Mesa has had 3993 MRs in the not long over a year since
> moving, and a similar number in GStreamer, just taking the two biggest
> users. At the start it was 'maybe let's use MRs if you want to but
> make sure everything still goes through the list', and now it's
> something different. Similarly the CI architecture hasn't been
> 'designed', so much as that people want to run dEQP and Piglit on
> their hardware pre-merge in an open fashion that's actually accessible
> to people, and have just done it.
>
> Again, if you want everything to be centrally
> designed/approved/monitored/controlled, that's a fine enough idea, and
> I'd be happy to support whoever it was who was doing that for all of
> fd.o.

I don't think we have any choice but to have someone centrally
controlling it, You can't have a system in place that lets CI users
burn largs sums of money without authorisation, and that is what we
have now.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> b) we probably need to take a large step back here.
>
> Look at this from a sponsor POV, why would I give X.org/fd.o
> sponsorship money that they are just giving straight to google to pay
> for hosting credits? Google are profiting in some minor way from these
> hosting credits being bought by us, and I assume we aren't getting any
> sort of discounts here. Having google sponsor the credits costs google
> substantially less than having any other company give us money to do
> it.

The last I looked, Google GCP / Amazon AWS / Azure were all pretty
comparable in terms of what you get and what you pay for them.
Obviously providers like Packet and Digital Ocean who offer bare-metal
services are cheaper, but then you need to find someone who is going
to properly administer the various machines, install decent
monitoring, make sure that more storage is provisioned when we need
more storage (which is basically all the time), make sure that the
hardware is maintained in decent shape (pretty sure one of the fd.o
machines has had a drive in imminent-failure state for the last few
months), etc.

Given the size of our service, that's a much better plan (IMO) than
relying on someone who a) isn't an admin by trade, b) has a million
other things to do, and c) hasn't wanted to do it for the past several
years. But as long as that's the resources we have, then we're paying
the cloud tradeoff, where we pay more money in exchange for fewer
problems.

> If our current CI architecture is going to burn this amount of money a
> year and we hadn't worked this out in advance of deploying it then I
> suggest the system should be taken offline until we work out what a
> sustainable system would look like within the budget we have, whether
> that be never transferring containers and build artifacts from the
> google network, just having local runner/build combos etc.

Yes, we could federate everything back out so everyone runs their own
builds and executes those. Tinderbox did something really similar to
that IIRC; not sure if Buildbot does as well. Probably rules out
pre-merge testing, mind.

The reason we hadn't worked everything out in advance of deploying is
because Mesa has had 3993 MRs in the not long over a year since
moving, and a similar number in GStreamer, just taking the two biggest
users. At the start it was 'maybe let's use MRs if you want to but
make sure everything still goes through the list', and now it's
something different. Similarly the CI architecture hasn't been
'designed', so much as that people want to run dEQP and Piglit on
their hardware pre-merge in an open fashion that's actually accessible
to people, and have just done it.

Again, if you want everything to be centrally
designed/approved/monitored/controlled, that's a fine enough idea, and
I'd be happy to support whoever it was who was doing that for all of
fd.o.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
Hi Matt,

On Thu, 27 Feb 2020 at 23:45, Matt Turner  wrote:
> We're paying 75K USD for the bandwidth to transfer data from the
> GitLab cloud instance. i.e., for viewing the https site, for
> cloning/updating git repos, and for downloading CI artifacts/images to
> the testing machines (AFAIU).

I believe that in January, we had $2082 of network cost (almost
entirely egress; ingress is basically free) and $1750 of cloud-storage
cost (almost all of which was download). That's based on 16TB of
cloud-storage (CI artifacts, container images, file uploads, Git LFS)
egress and 17.9TB of other egress (the web service itself, repo
activity). Projecting that out gives us roughly $45k of network
activity alone, so it looks like this figure is based on a projected
increase of ~50%.

The actual compute capacity is closer to $1150/month.

> I was not aware that we were being charged for anything wrt GitLab
> hosting yet (and neither was anyone on my team at Intel that I've
> asked). This... kind of needs to be communicated.
>
> A consistent concern put forth when we were discussing switching to
> GitLab and building CI was... how do we pay for it. It felt like that
> concern was always handwaved away. I heard many times that if we
> needed more runners that we could just ask Google to spin up a few
> more. If we needed testing machines they'd be donated. No one
> mentioned that all the while we were paying for bandwidth... Perhaps
> people building the CI would make different decisions about its
> structure if they knew it was going to wipe out the bank account.

The original answer is that GitLab themselves offered to sponsor
enough credit on Google Cloud to get us started. They used GCP
themselves so they could assist us (me) in getting bootstrapped, which
was invaluable. After that, Google's open-source program office
offered to sponsor us for $30k/year, which was I believe last April.
Since then the service usage has increased roughly by a factor of 10,
so our 12-month sponsorship is no longer enough to cover 12 months.

> What percentage of the bandwidth is consumed by transferring CI
> images, etc? Wouldn't 75K USD would be enough to buy all the testing
> machines we need and host them within Google or wherever so we don't
> need to pay for huge amounts of bandwidth?

Unless the Google Cloud Platform starts offering DragonBoards, it
wouldn't reduce our bandwidth usage as the corporate network is
treated separately for egress.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev