Re: Another proposal for modernization of our infrastructure

Jan Kundrát Wed, 28 Jan 2015 03:27:56 -0800

On Wednesday, 28 January 2015 10:08:54 CEST, Ben Cooksley wrote:

1) Most applications integrate extremely poorly with LDAP. They
basically take the details once on first login and don't sync the
details again after that (this is what both Chiliproject and
Reviewboard do). How does Gerrit perform here?

Data are fetched from LDAP as needed. There's a local cache for speedup(with configurable TTL and support for explicit flushes).

2) For this trivial scratch repository script, will it offer it's own
user interface or will developers be required to pass arguments to it
through some other means? The code you've presented thus far makes it
appear some other means will be required.

I might not fully understand this question -- I thought we already dicussedthis. The simplest method of invocation can be as easy as `sshuser@somehost create-personal-project foobar`, with SSH keys verified byOpenSSH. This is the same UI as our current setup. There are other options,some of them with fancy, web-based UIs.

3) We've used cGit in the past, and found it suffered from performance
problems with our level of scale. Note that just because a solution
scales with the size of repositories does not necessarily mean it
scales with the number of repositories, which is what bites cGit. In
light of this, what do you propose?

An option which is suggested in the document is to use our current quickgitsetup, i.e. GitPHP. If it works, there's no need to change it, IMHO, andsticking with it looks like a safe thing to me. But there are manyadditional choices (including gitiles).

4) Has Gerrit's replication been demonstrated to handle 2000 Git
repositories which consume 30gb of disk space? How is metadata such as
repository descriptions (for repository browsers) replicated?

Yes, Gerrit scales far beyond that. See e.g. the thread athttps://groups.google.com/forum/#!topic/repo-discuss/5JHwzednYkc for realusers' feedback about large-scale deployments.

5) If Gerrit or it's hosting system were to develop problems how would
the replication system handle this? From what I see it seems highly
automated and bugs or glitches within Gerrit could rapidly be
inflicted upon the anongit nodes with no apparent safe guards (which
our present custom system has). Bear in mind that failure scenarios
can always occur in the most unexpected ways, and the integrity of our
repositories is of paramount importance.

I agree that one needs proper, offline and off-site backups for criticaldata, and that any online Git replication is not a proper substitute forthis. The plan for disaster recovery therefore is "restore from backup".

In terms of Gerrit, this means backing up all of the Git repositories andthe dumping the PostgreSQL database, and storing these in a location whichcannot be wiped out or modified by an attacker who has root on the main Gitserver, or by a software bug in our Git hosting. One cannot get that withjust Git replication, of course.

What are the safeguard mechanisms that you mentioned? What threats do theymitigate? I'm asking because e.g. the need for frequent branch deletion isminimized by Gerrit's code review process which uses "branches" inside.What risks do you expect to see here?

6) Notifications: Does it support running various checks that our
hooks do at the moment for license validity and the like? When these
rules are tripped the author is emailed back on their own commits.

Yes, the proposed setup supports these. The best place for implementingthem is via CI invocation through the ref-updated hook. My personalpreference would be a ref-updated event handler in Zuul to ensure properscalability, but there are other options.

7) Storing information such as tree metadata location within
individual Git repositories is a recipe for delivering a system that
will eventually fail to scale, and will abuse resources. Due to the
time it takes to fork out to Git,


Gerrit uses JGit, a Java implementation of Git. There are no forks.

plus the disk access necessary for
it to retrieve the information in question, I suspect your generation
script will take several load intensive minutes to complete even if it
only covers mainline repositories. This is comparable to the
performance of Chiliproject in terms of generation at the moment.

The yesterday-released Gerrit 2.10 adds a REST API for fetching arbitrarydata from files stored in Git with aggressive caching. I would like to usethat for generating that kde_projects.xml file.

The original generation of our Git hooks invoked Git several times per
commit, which meant the amount of time taken to process 1000 commits
easily reached 10 minutes. I rebuilt them to invoke git only a handful
of times per push - which is what we have now.

Gerrit has a different architecture with no forks and aggressive caching.I'm all for benchmarking, though. Do you want a test repository to run yourbenchmarks against?

8) Shifting information such as branch assignments in the same manner
will necessitate that someone have access to a copy of the Git
repository to determine the branch to use. This is something the CI
system cannot ensure, as it needs to determine this information for
dependencies, and a given node may not have a workspace for the
repository in question. It also makes it difficult to update rules
which are common among a set of repositories such as those for
Frameworks and Plasma (Workspace). I've no idea if it would cause
problems for kdesrc-build, but that is also a possibility.

The kde_projects.xml which stores a copy of these data will remainunchanged, and it should also remain to be the place consulted by e.g. theCI scripts, or the kdesrc-build. These tools will need no change.

What the proposal says is to base generating of that file on data in Gitrather than on a custom webapp.

9) You've essentially said you are going to eliminate our existing
hooks.

The proposal said that it might be possible to replace a large part of thefunctionality with Gerrit's native features with zero maintenance. If theremaining functionality (CRLF line endings and author human names fordirect pushes) is important to warrant an ongoing maintenance of the customhooks, they can be run without a problem.

Does Gerrit support:
    a) line ending checks, with exceptions for certain file types and
repositories?

The proposal says to handle this by the CI setup. This means that it wasproposed to enable pushing CRLF data to our repos, with a followup e-mailsaying "hey, you're doing a bad thing". That's a trade off for not havingto maintain these scripts.


Alternative options for this include:
- preserving this part of the hooks and running them from Gerrit,
- extending an existing Git validation plugin to do this.

    b) Convenient deactivation of these checks if necessary.


Yes, this is configurable.

    c) Checks on the author name to ensure it bears a resemblence to
something actual.

No, the author's name is not checked at the moment. If we decide to changethis, it's going to be a couple-line patch, or a custom hook.

However, I do not think that checking names in the way the hooks to it nowis actually a good thing. Please read http://wookware.org/name.html for anexample of a real person from the UK who cannot commit to KDE.

The potential for mistakes is largely mitigated by checks for e-mailvalidity. In order for this to be a problem, one would have to push acommit with a valid e-mail address, but wrong name ("jkt <j...@kde.org>").We should evaluate whether risking this is worth the reduced maintenance.

Also, this only affects direct pushes and KDE developers. Patch proposalsfrom third parties can be easily and immediately downvoted by the CI, witha helpful message on what to fix.

    d) Prohibiting anyone from pushing certain types of email address
such as *@localhost.localdomain?


Yes:

Similar check applies to e-mail validation. An ACL verifies whether ane-mail matches one of user’s registered address. These addresses are eitherread from LDAP, or validated by a mail probe to make sure that theyactually exist and belong to the user in question. This validation can beconfigured on an LDAP group basis, so it is possible to allow KDEdevelopers to push commits on behalf of third-party contributors whilepreventing regular users from faking their identity.

10) You were aware of the DSL work Scarlett is doing and the fact this
is Jenkins specific (as it generates Jenkins configuration). How can
this work remain relevant?
Additionally, Scarlett's work will introduce declarative configuration
for jobs to KDE.

My understanding of Scarlett's work is that it aims at cleaning up ourcurrent configuration, making it work on Windows and OS X, and to introducea declarative language for preparing job descriptions. AFAIK, the only partwhich might be Jenkins-specific is the last bit, and I fully expect adeclarative generator being able to generate job descriptions for anothersystem just by adding a proper output format. Moving to declarativeapproach is the big change here; adding another output is much less work.

11) We actually do use some of Jenkins advanced features, and it
offers quite a lot more than just a visual view of the last failure.
As a quick overview:
    a) Tracked history for tests (you can determine if a single test
is flaky and view a graph of it's pass/fail history).

Please see section 3.3.2 which discusses possible ways on how to deal withflaky tests. IMHO, the key feature and our ultimate goal is "let's handleflaky tests efficiently", not "let's have a graph of failing tests" (howwould that work with a non-linear history of pre-merge CI?).

    b) Log parsing to track the history of compiler warnings and other
matters of significance (this is fully configurable based on regexes)

That's in section 3.3.3. One solution for using this is making the buildwarning-free on one well-known platform, and enforcing -Werror in there.

    c) Integrated cppcheck and code coverage reports, actively used by
some projects within KDE.

The Zuul-based CI setup launches KDE's existing build scripts and deliverstheir output. I choose to disable cppcheck for simplicity and because noprojects that are currently in Gerrit are covered by Jenkins' cppcheck onbuild.kde.org at this time. There is no reason for not enabling cppcheckruns again, of course. When I looked at it last time, it however seemd thatthe include paths were not being passed properly and the data I got backwere clearly bogous, so I decided to skip that for now. The same applies tocoverage reports. Both will be provided, of course.

    d) Intelligent dashboards which allow you to get an overview of a
number of jobs easily.

Bear in mind that these intelligent dashboards can be setup by anyone
and are able to filter on a number of conditions. They can also
provide RSS feeds and update automatically when a build completes.

How would Zuul offer any of this? And how custom would this all have
to be? Custom == maintenance cost.

The report explicitly acknowledges a need of future work for this statusmatrix, and proposes how to get there (section 3.3.4).

Regarding the maintenance costs, let's wait for when it is ready andevaluate the maintenance burden at that point.

Addendum: the variations, etc. offered by the Zuul instance which
already exists in the Gerrit clone are made possible by the hardware
resources Jan has made available to that system. Jenkins is fully
capable of offering such builds as well with the appropriate setup,
some of which are already used - see the Multi Configuration jobs such
as the ones used by Trojita and Plasma Framework.

I believe that it is not about HW resources, but about services'configuration. Does KDE's Jenkins as-is support building against asystemwide version of Qt, for example?

You've lost me i'm afraid with the third party integration - please
clarify what you're intending here.

I am pointing out that it is easy to plug a third-party testing system toGerrit/Zuul pretty easily, mainly due to the open APIs and the system'sarchitecture. If e.g. one of the FreeBSD guys wanted to help, they wouldhave a way of getting involved without an explicit action from sysadmins.To me, that lowers the bar of entry a bit, and it also frees up somesysadmin time for more important tasks, so I think that it's a benefit ofsuch a setup.

12) The tone of the way the event stream feature is mentioned makes it
sound like sysadmin actively prevents people from receiving the
information they need. We have never in the past prevented people from
receiving notifications they've requested - you yourself have one that
triggers builds on the OBS for Trojita.

It was never my intention to imply anything like that; sorry for this. Thatsection says that it requires manual effort from sysadmins and custom code.In contrast to that, the proposed setup enables anyone to listen for eventsin a machine-readable way without any prior effort from sysadmins to enablethat.

13) You've used the terminology "we" throughout your document. Who are
the other author(s)?

I think this is similar to the previous report. I received feedback aboutthis paper from several developers. Due to the rather heated nature of theprevious rounds of the discussion and some personal attacks, they preferredto not be credited as authors. The actual wording is mine, I wrote thetext, so I'm listed as the only author.

Anyway, I hope that we'll be able to judge the merits of the individualproposals, and that this won't deteriorate into a popularity contest.


Cheers,
Jan

--
Trojitá, a fast Qt IMAP e-mail client -- http://trojita.flaska.net/

Re: Another proposal for modernization of our infrastructure

Reply via email to