Re: [CI] What are the troubles projects face with CI and Infra

Chesnay Schepler Tue, 04 Feb 2020 01:10:00 -0800

I believe the write permission is used by CI services mostly to attach aGitHub Check for the build to the commit.

From what I know there's no dedicated permission for that.

The Flink project is actively pursuing having a separate repository forrunning CI, that is not owned by Apache.The core motivation was that we were using too much ASF Travisresources, and wanted to offload some of that to a sponsored account,and the (seemingly) _only_ way to do that (at least with Travis) was tohave a separate org+repo.Pull Requests (and in the future, branches) are mirrored by a bot intothis repository, triggering builds,

the results of which are written as a comment into the PR / sent to the ML.

Should we rethink our approach?

On 04/02/2020 06:07, Kenneth Knowles wrote:

(Top-posting a question that rewinds this thread a bit. Feel to continue
other discussion on the latest inline email)

Why do so many tools require write access? It seems like there's at least
*some* part of this that is a technical limitation... dare I say "error"?

My years-stale understanding (from reviewable.io and codecov.io IIRC, both
of which I would have loved to use but couldn't, and not just on ASF repos)
was that the limitation was GitHub's ACLs were too coarse-grained. Is this
still true? Do they know this is a big problem? Are they leaving things
as-is deliberately or through lack of funding? OTOH my understanding of
other tools (prow? Beam's defunct mergebot?) is that the tool itself really
wants to manage the repo for you, queuing up merges and doing them, etc. I
don't really know buildkite. It might be helpful to have a table on a wiki
of where these tools fail the policies.

Technical opinion: in normal git workflow as I see it, any person or *tool*
that wishes to create a branch can do so in its own fork. Wanting to write
to a branch in some other person's or org's fork is like wanting to write
to their hard drive: there are reasons, but doing so has to be inextricable
from your core functionality, or you are probably doing it wrong.

Over the years, I've felt this pain of CI tools not being able to be used,
but I have almost universally considered the *other* party to be the source
of the pain, not ASF's very reasonable policies. Is ASF able to influence
their roadmaps, or at least keep in touch about them? A combination of best
practices amongst projects and tools that understand the whole point of git
would go a long way.

(I welcome opinions that I am just wrong and these CI tools are doing
exactly the best thing they should be doing - that would be new and useful
info for me)

Kenn

On Mon, Feb 3, 2020 at 7:50 PM David Nalley <[email protected]> wrote:

On Tue, Feb 4, 2020 at 4:29 AM Alex Harui <[email protected]>
wrote:

Hopefully last set of questions for now...

Just wait, the rabbit hole gets deeper :)

1) It sounds like there is a risk that as the ASF grows, GH may not be

able to grow with us.  Did I understand that correctly?

GH CI may not be willing to continue giving us free usage. The current
free usage we have is limited, but they are willing to augment - to
what degree we aren't sure yet. We're talking with Github.
Github the VCS will always be free (at least for all versions of the
future that I can foresee short of Github being shuttered)

2) If we have money to offer GH, why can't we offer money to the CI

Vendors so we aren't really abusing their free tiers?

We currently pay one CI vendor (Travis - the only one aside from GH
that doesn't need write access. We pay them 12k a year, and are
planning on increasing that spend in next years budget.
We've discussed paying or getting cloud credits from both Azure and
AWS - but ran into the write access problem.
We're currently discussing with GH getting credits or paying them for
more Github Actions capacity.

3) Does GH track my activity in the ASF GH repos as part of the API

usage for Apache?  IOW, am I adding to the ASF API count by closing an
issue on github.com?  Or if I ran a script on my computer that closed the
issue by using their API?

No, it's tied to our user/IP address. Your actions likely won't come
close to our complex usage.

I think builds.a.o is a great free service, but AIUI, the

no-third-party-write-access rule is independent of whether CI is free or
not.  I cannot pay money and get write-access to the ASF repos.  So I think
I'm trying to see if there is a solution even if it did cost money.
I should have been more explicit - we aren't opposed to spending money
on this, and do already spend some money. I'm worried that there is no
limit to the money that could be spent - particularly when people
don't have good insight into what their builds might cost the
Foundation. So for instance, there was a project at the ASF that
consumed 900 dollars/month of our 1000/month spend with Travis. They
didn't realize that they were consuming so much. They also didn't
realize that other projects were feeling the pain - they had optimized
their CI builds to execute really fast in Travis - essentially
concurrently consuming every builder. But the reality is that some
projects need more resources than others and allocating resources
appropriately becomes quite the challenge.

Thanks in advance,
-Alex

On 2/3/20, 7:03 PM, "David Nalley" <[email protected]> wrote:

     On Tue, Feb 4, 2020 at 3:56 AM Alex Harui <[email protected]>

wrote:

     >
     > Some questions inline.  Apologies in advance for not really

understanding this stuff.  I'm primarily a client-side developer.  My
projects do not have automated PR testing at this point in time.  I'm
mainly exploring in case we become popular enough some day to need it.

     >
     > My line of thinking is that MS has, at least for now, generously

provided free Azure VMs to ASF committers.  If N committers from a project
each get a VM, run CI on it, figure out some way to distribute PRs to those
VMs, is there a viable workflow?

     >
     > On 2/3/20, 6:38 PM, "David Nalley" <[email protected]> wrote:
     >
     >     Hi Alex,
     >
     >     So this was explored. It creates some problems - first double

the

     >     administration overhead - most of that is automated, but it

means that

     >     our API usage doubles, and we're already hitting limits from

Github.

     >
     > Is that a max-traffic limit or a limit on traffic before we have

to start paying for usage?

     Max number of calls - and we've tried offering up money, they don't
     offer a product with more API calls. Greg has even raised this issue
     all the way to the CEO of Github.

     >
     >     Second - at least one CI vendor thanked us for not doing that

exactly

     >     - because the 'best' way to do it is to create an org per

project or

     >     org per repo - and then the free tier is dedicated to that

org. Except

     >     that's essentially abusing their free tier.
     >
     > Is "best" defined as lowest cost to the CI vendor or something

else?  What would the "second-best" scenario look like if there is one?

     Best - well it's the cheapest for us, and it gives the most control

to

     the projects. So great from that perspective, but likely a bit
     unethical and abusive. It's essentially abusing all of the CI vendors
     generosity by horizontally scaling our consumption of their freebies
     and using them per-repo or per project instead of per organization.


     >
     >     Finally - from a practical perspective, if everyone submits

PRs and

     >     does testing against this apacheci org - that has become the

de facto

     >     repo - it's where everyone is doing their work, and it makes
     >     provenance tracking.
     >
     > Didn't the ASF have read-only mirrors of repos?  I think it led to

some confusion, but I think folks still figured out.

     >

     Not anymore.
     We have an active-active copy of the repositories. People can

actively

     commit against either our repos or the GH repos, and we magically

move

     commits between the two. (There's an upcoming blog post on how all of
     this magic works)

     >     As an aside - the mandate for no write access is not an

infrastructure

     >     policy, it's a legal affairs requirement - we're merely

implementing

     >     it.
     >
     >     --David
     >
     >     On Tue, Feb 4, 2020 at 3:24 AM Alex Harui

<[email protected]> wrote:

     >     >
     >     > Moving board@ to BCC.  Attempting to move discussion to

builds@

     >     >
     >     > I’m fine with the ASF maintaining its position on stricter

provenance and therefore disallowing third-party write-access to repos.

     >     >
     >     > A suggestion was made, if I understood it correctly, to

create a whole other set of repos that could be written to by
third-parties.  Would such a thing work?  Then a committer would have to
manually bring commits back from that other set to the canonical repo.
That seems viable to me.

     >     >
     >     > A concern was raised that the project might cut its release

from the “other set”, but IMO, that would be ok if the release artifacts
could be verified, which should be possible by comparing the canonical repo
against the “other repo”, at least for the source package, and if there are
reproducible binaries, for the binary artifacts as well.

     >     >
     >     > Thoughts?
     >     > -Alex
     >     >
     >     > From: Greg Stein <[email protected]>
     >     > Reply-To: "[email protected]" <[email protected]>
     >     > Date: Monday, February 3, 2020 at 5:17 PM
     >     > To: "[email protected]" <[email protected]>
     >     > Subject: Re: [CI] What are the troubles projects face with

CI and Infra

     >     >
     >     > On Mon, Feb 3, 2020 at 6:48 PM Alex Harui <[email protected]

<mailto:[email protected]>> wrote:

     >     > >...
     >     > How does Google or other non-ASF open source projects manage

the provenance tracking?

     >     >
     >     > Note that most F/OSS projects don't worry about provenance

to the level the Foundation worries. That affords them some flexibility
that our choices do not allow. Those projects may also choose to trust
tools with write access to their repositories, hoping they will not Do
Something Bad(tm). We have chosen to not provide that trust.

     >     >
     >     > IMO, I do not think the Foundation should relax its stance

on provenance, nor trust in third parties ... but that is one of the key
considerations [for the Board] at the heart of being able to leverage some
third party CI/CD services.

     >     >
     >     > Cheers,
     >     > -g
     >     >
     >
     >

Re: [CI] What are the troubles projects face with CI and Infra

Reply via email to