Re: Commit loss prevention

Kevin Fleming (BLOOMBERG/ 731 LEXIN) Tue, 12 Nov 2013 06:41:30 -0800

When you say 'canonical' in this proposal, do you mean the repositories used 
for making releases, or the repositories where development (and especially, 
pull requests) would be handled?


If it's the former, I could see that being worthwhile, especially if *nobody* 
has permissions to push to the canonical repositories; if a developer pushes 
code to the master branch of their repo on GitHub, they'd have to wait a short 
time for that update to be mirrored to the release repo before they could make 
a release. Of course, this would put extra pressure on the people who are 
maintaining the project infrastructure, to be sure that this mirroring process 
is working reliably all the time.

----- Original Message -----
From: jenkinsci-dev@googlegroups.com
To: jenkinsci-dev@googlegroups.com
At: Nov 12 2013 05:17:15

I think part of the issue is that our canonical repositories are on github...

I would favour jenkins-ci.org being masters of its own destiny... hence I would 
recommend hosting canonical repos on project owned hardware and using GIT as a 
mirror of those canonical repositories... much like the way ASF uses GIT. That 
would allow us to implement policies such as preventing forced push to specific 
branches, etc...

Of course that would be another pom.xml <scm> update change, namely the 
<developerConnection> would point to the canonical repo while the <connection> 
would point to the github repo... (with some use of 
http://developer.github.com/v3/users/keys/#list-public-keys-for-a-user we 
should be able to let users just register their keys in github)

e.g. the <scm> details would look like:

  <scm>
    <connection>scm:git:git://github.com/jenkinsci/[plugin 
name]-plugin.git</connection>
    <developerConnection>scm:git:git.jenkins-ci.org:jenkinsci/[plugin 
name]-plugin.git</developerConnection>
    <url>http://github.com/jenkinsci/[plugin name]-plugin</url>
  </scm>

Maven will then do the "right thing" for pushing releases *even if you checkout 
from github*... and we just have the canonical repos force push to github and 
put proper permission sets on the canonical repos... most developers will thus 
see no effective difference :-)


On 12 November 2013 06:25, Kohsuke Kawaguchi <k...@kohsuke.org> wrote:


Now that the commits have been recovered and things are almost back to normal, 
I think it's time to think about how to prevent this kind of incidents in the 
future.

Our open commit access policy was partly made possible by the idea that any bad 
commits can be always rolled back. But where I failed to think through was that 
the changes to refs aren't by themselves version controlled, and so it is 
possible to lose commits by incorrect ref manipulation, such as "git push -f", 
or by deleting a branch.
 
I still feel strongly that we maintain the open commit access policy. This is 
how we've been operating for the longest time, and it's also because otherwise 
adding/removing developers to repositories would be prohibitively tedious.
 
So my proposal is to write a little program that uses GitHub events API to keep 
track of push activities in our repositories. For every update to a ref in the 
repository, we can record the timestamp, SHA1 before and after, the user ID. We 
can maintain a text file for every ref in every repository, and the program can 
append lines to it. In other words, effectively recreate server-side reflog 
outside GitHub.
 
The program should also fetch commits, so that it has a local copy for every 
commit that ever landed on our repositories. Doing this also allows the program 
to detect non fast-forward. It should warn us in that situation, plus it will 
create a ref on the commit locally to prevent it from getting lost.
 
We can then make these repositories accessible via rsync to encourage people to 
mirror them for backup, or we can make them publicly accessible by hosting them 
on GitHub as well, although the latter could be confusing.
 
WIth a scheme like this, pushes can be safely recorded within a minute or so 
(and this number can go down even further if we use webhooks.) If a data loss 
occurs before the program gets to record newly pushed commits, we should still 
be able to record who pushed afterward to identify who has the commits that 
were lost. With such a small time window between the push and the record, the 
number of such lost commits should be low enough such that we can recover them 
manually.
 
-- 
Kohsuke Kawaguchi  -- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Commit loss prevention

Reply via email to