Re: git interface to snapshot.debian.org
Joachim Breitner writes ("Re: git interface to snapshot.debian.org"): > Am Dienstag, den 25.08.2015, 21:47 +0100 schrieb Ian Jackson: > > (Although if a .dsc migrates between suites, the git history > > is updated.) > > I don’t understand that. Is there really git history changed? Or just > branches updated? The branch appears to update (when viewed with dgit fetch, although it isn't updated on the server). However there may have to be a commit made to make the new tip of that branch fast forwarding from the old tip of the same branch. (As I write this I have second thoughts about whether this works properly in all cases; /me makes a note to check.) Ian.
Re: git interface to snapshot.debian.org
Hi, Am Dienstag, den 25.08.2015, 21:47 +0100 schrieb Ian Jackson: > (Although if a .dsc migrates between suites, the git history > is updated.) I don’t understand that. Is there really git history changed? Or just branches updated? Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Re: git interface to snapshot.debian.org
Joachim Breitner writes ("Re: git interface to snapshot.debian.org"): > Am Dienstag, den 25.08.2015, 13:59 +0100 schrieb Ian Jackson: > > > If the answer is „Nothing is stopping, just that someone has to do it“, > > > then I’m volunteering, as long as I can do most of it during DebConf. > > > > There are two problems that are stopping us doing this right away: > > > > - Maybe the amount of data is too big to suddenly dump in the dgit > > git server (we should talk to DSA) > > as mentioned I created a proof-of-concept bash script, and for example > the (git gc’ed) repository of all history of ghc is 137MB. screen > -message, as an example for a small package, amounts to 572KB. Not sure > how to best extrapolate that, though. Right. I can't see how to do it without actually trying it on the whole archive. I guess we could run a program that did this for each package, noted the size, and then threw the resulting git branch away. That would use up some computer time and elapsed time but wouldn't require an enormous scratch area. > > > >- Parents: This is the interesting bit > > > > The set of parents should be the commits corresponding to any > > > > version mentioned in debian/changelog, pruned by those that > > > > are transitively reachable. > > > > > > Nice idea. > > At least for GHC, which had independently running branches in unstable > and experimental for a while, with occasional merges from unstable to > experimental, this worked fine. > > I guess dgit by itself does not do anything like that, but rather > expects the right ancestry to come out of the „normal“ git use of the > maintianer. Indeed. (Although if a .dsc migrates between suites, the git history is updated.) > Anyways, I postponed this project for now; too much other things going > on. I might get back to it in the future. In that case, I would > probably first try to get nice git repositories from all of > snapshot.d.o, independent of dgit. Once we have that, one can see if > and how that is best integrated with dgit. OK. > (If you or someone else beats me to it: Even better :-)) Heh. Thanks, Ian.
Re: git interface to snapshot.debian.org
Hi, Am Dienstag, den 25.08.2015, 13:59 +0100 schrieb Ian Jackson: > > If the answer is „Nothing is stopping, just that someone has to do it“, > > then I’m volunteering, as long as I can do most of it during DebConf. > > There are two problems that are stopping us doing this right away: > > - Maybe the amount of data is too big to suddenly dump in the dgit > git server (we should talk to DSA) as mentioned I created a proof-of-concept bash script, and for example the (git gc’ed) repository of all history of ghc is 137MB. screen -message, as an example for a small package, amounts to 572KB. Not sure how to best extrapolate that, though. > > >- Parents: This is the interesting bit > > > The set of parents should be the commits corresponding to any > > > version mentioned in debian/changelog, pruned by those that > > > are transitively reachable. > > > > Nice idea. At least for GHC, which had independently running branches in unstable and experimental for a while, with occasional merges from unstable to experimental, this worked fine. I guess dgit by itself does not do anything like that, but rather expects the right ancestry to come out of the „normal“ git use of the maintianer. Anyways, I postponed this project for now; too much other things going on. I might get back to it in the future. In that case, I would probably first try to get nice git repositories from all of snapshot.d.o, independent of dgit. Once we have that, one can see if and how that is best integrated with dgit. (If you or someone else beats me to it: Even better :-)) Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Re: git interface to snapshot.debian.org
[ Added d-a@ldo for the dsa parts. ] On Tue, 25 Aug 2015, Ian Jackson wrote: > > If the answer is „Nothing is stopping, just that someone has to do it“, > > then I’m volunteering, as long as I can do most of it during DebConf. > > There are two problems that are stopping us doing this right away: > > - Maybe the amount of data is too big to suddenly dump in the dgit > git server (we should talk to DSA) Do you have an estimate on the resources required? I'm sure we can figure something out. Cheers, -- | .''`. ** Debian ** Peter Palfrader | : :' : The universal https://www.palfrader.org/ | `. `' Operating System | `-https://www.debian.org/
Re: git interface to snapshot.debian.org
Joachim Breitner writes ("git interface to snapshot.debian.org"): > this is a follow-up to my question after the dgit talk today: It would > be great to have a git view of the a package’s history in Debian. There > is some possible overlap with dgit in the sense that if everyone had > been using dgit from the start, then we would have that, but dgit’s > objectives are slightly different, so maybe my question could be posed > and answered separately. Hi. I'm sorry that we didn't manage to talk about this idea of yours properly at DC. I think it is a good idea. > If the answer is „Nothing is stopping, just that someone has to do it“, > then I’m volunteering, as long as I can do most of it during DebConf. There are two problems that are stopping us doing this right away: - Maybe the amount of data is too big to suddenly dump in the dgit git server (we should talk to DSA) - There is nothing which currently automatically updates the server-side dgit history when non-dgit uploads are made. Such a thing could be produced, but I think it is essential to have it before we embark on a historical import. There are some things that you would need my help (as admin of dgit.d.o) with: - your histories would have to be stitched into the dgit git history for packages with existing dgit history - your ref updates can't be done by DDs in general, because the dgit git server only accepts updates made with dgit push. So you would need me with my admin hat on to do the updates. Your details seem largely sound: > * Every source package from snapshots.d.o becomes, extracted with >dpkg-source -x as usual, produces a git tree object. >I’d probably simply ignore empty directories. Right. >- Parents: This is the interesting bit > The set of parents should be the commits corresponding to any > version mentioned in debian/changelog, pruned by those that > are transitively reachable. Nice idea. > * Every suite (unstable, jessie...) becomes a branch, pointing to the >corresponding commit s/unstable/sid/, but yes. > * Optionally: One tag per version pointing to the corresponding >commit, for each version. Although maybe that would produce too >many tags... We definitely want this. The tag should be in DEP-14 format, which makes it identical to existing dgit git tags. Thanks, Ian.
Re: git interface to snapshot.debian.org
Hi, Am Donnerstag, den 20.08.2015, 11:03 +0200 schrieb Marco d'Itri: > I have not thought of recreating the history of the upstream > > versions > > correctly. I mostly care about the “what was in Debian” aspect of > > history, but it should not hurt to have the upstream branches as well. > I am not sure then that I see how these snapshot would be more useful > than just downloading the source packages. > If you are just looking at adding a git interface to snapshot.d.o. then > maybe git-annex would be more appropriate while using much less > resources? you can do things like "git blame", or easily run "git diff 0.1..21.0.1-1~bpo", or whatever nice things you can do with your git foo. And the git data store means you can fetch the whole history more efficiently than running "debsnap" on all versions. Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Re: git interface to snapshot.debian.org
On Thu, Aug 20, 2015 at 11:03:45AM +0200, Marco d'Itri wrote: > If you are just looking at adding a git interface to snapshot.d.o. then > maybe git-annex would be more appropriate while using much less > resources? This is primarily targeted at serving as a common starting point for packages that enter into the dgit workflow, which aims at having something git-cloneable that is equivalent to the source package. Unless there is a view of the package's history available, dgit would (and currently does) start off with whichever state the package is in in the archive at the point in time when dgit is first used with it. chrysn signature.asc Description: Digital signature
Re: git interface to snapshot.debian.org
On Aug 20, Joachim Breitner wrote: > I have not thought of recreating the history of the upstream versions > correctly. I mostly care about the “what was in Debian” aspect of > history, but it should not hurt to have the upstream branches as well. I am not sure then that I see how these snapshot would be more useful than just downloading the source packages. If you are just looking at adding a git interface to snapshot.d.o. then maybe git-annex would be more appropriate while using much less resources? -- ciao, Marco pgp0iiuVNwoHS.pgp Description: PGP signature
Re: git interface to snapshot.debian.org
Hi, Am Donnerstag, den 20.08.2015, 05:40 +0200 schrieb Marco d'Itri: > On Aug 18, Joachim Breitner wrote: > > > this is a follow-up to my question after the dgit talk today: It would > > be great to have a git view of the a package’s history in Debian. There > > I have spent quite a lot of time in 2014 to figure out how to > automatically import in git repositories over 15 years of the history of > my packages, with upstream sources and Debian diffs properly merged and > tags for everything. Have a look at: > > http://www.linux.it/~md/software/import-inn2.sh > http://anonscm.debian.org/gitweb/?p=users/md/inn2.git I have not thought of recreating the history of the upstream versions correctly. I mostly care about the “what was in Debian” aspect of history, but it should not hurt to have the upstream branches as well. There might be tricky corner cases when trying to figure out what the parent(s) of an upstream commit should be. Anyways, although I have a bash prototype that worked fine so far, I’m moving this project further down my queue of nice things I want to do, as there are more pressing tasks on my plate. > Also: is there any point in continuing to automatically update these > repositories for the packages which nowadays have useful git > repositories? The benefit of having git repos on snaptshot.debian.org is that they would be uniform, i.e. not dependent on the maintainer doing anything in any particular way, which can be useful when you build tools on top of that. Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Re: git interface to snapshot.debian.org
On Aug 18, Joachim Breitner wrote: > this is a follow-up to my question after the dgit talk today: It would > be great to have a git view of the a package’s history in Debian. There I have spent quite a lot of time in 2014 to figure out how to automatically import in git repositories over 15 years of the history of my packages, with upstream sources and Debian diffs properly merged and tags for everything. Have a look at: http://www.linux.it/~md/software/import-inn2.sh http://anonscm.debian.org/gitweb/?p=users/md/inn2.git A significant issue, which I decided to not solve, is how to handle the upstream tree of old packages which use DBS. (And while I have been able to recover some releases which are not in snaphost.d.o, sadly some very old ones have been lost forever.) Also: is there any point in continuing to automatically update these repositories for the packages which nowadays have useful git repositories? -- ciao, Marco pgpXPXGyuG7Jm.pgp Description: PGP signature
Re: git interface to snapshot.debian.org
Hi, Am Dienstag, den 18.08.2015, 15:36 +0200 schrieb Joachim Breitner: > I really like to have a DAG that represents the history of the > package, and I don’t think it is a big complication. The bug tracker > does it also this way, I believe, to produce these nice graphs in the > corner. attached is the result of my prototype. Not too complicated after all. Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Re: git interface to snapshot.debian.org
[Dropping Peter from CC] Hi, Am Dienstag, den 18.08.2015, 15:22 +0200 schrieb Thomas Koch: > > > * Every source package from snapshots.d.o becomes, extracted with > >dpkg-source -x as usual, produces a git tree object. > >I’d probably simply ignore empty directories. > Please add a trailer line in the commit message that can be used as argument > to mkdir -p to recreate the directoriesl My current code creates a tree object tagged with the version number, and defers the creation of the commit objects till later, without access to the unpacked source, so this feature requires a way to carry over that information. I could ofcourse force an empty directory into the tree object, following http://stackoverflow.com/questions/11600871/git-repo-contains-an-empty-directory-what-happens/11600882#11600882 unfortunately git does not cope well with that. I’ll think about it. > > * Every source package also produces a git commit, with > >- Tree: the above > >- Author: top changelog entry > >- Date: also top changelog entry > >- Description summary: The version number > >- Description text: The top changelog entry. > >- Parents: This is the interesting bit > > The set of parents should be the commits corresponding to any > > version mentioned in debian/changelog, pruned by those that > > are transitively reachable. > > > > This ensures that we get a nice git DAG for things like packages > > that have been experimental for a while, merging from unstable > > repeatedly. > > > > The order of parents could correspond to the order in > > debian/changelog, so that the second changelog entry becomes > > the first parent. > > Since you see the complication: Why not have no parents at all? Just have > tags > that point to orphan commits. One can still use diff. If you should need a > branch for some reason (I doubt) than have a tool to order the tags with dpkg > --compare-version and write new commits that form a branch. I really like to have a DAG that represents the history of the package, and I don’t think it is a big complication. The bug tracker does it also this way, I believe, to produce these nice graphs in the corner. Also, "git merge-base --independent" does the heavy lifting of this logic for us. > Advantages: > - No need to think about the right ordering I’m willing to think :-) > - Problematic versions can be removed any time I’m willing to have the commit id changes in this case. > - Users can fetch just one specific tag without downloading other versions. > This might make a difference with packages with lots of binary data. You can still use "git clone --shallow", can’t you? > > * Every suite (unstable, jessie...) becomes a branch, pointing to the > >corresponding commit > Just encode the suite in the tag name: PACKAGE/SUITE/VERSION: > libcgi-application-plugin-authorization-perl/experimental/3:5.4_g424242+5-2bpo That does not look right: Either I want the version in experimental, or I want a specific version. Also, how do you get backports versions into experimental? ;-) Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part
Re: git interface to snapshot.debian.org
On Tuesday 18 August 2015 11:15:17 Joachim Breitner wrote: > Hi, > > this is a follow-up to my question after the dgit talk today: It would > be great to have a git view of the a package’s history in Debian. There > is some possible overlap with dgit in the sense that if everyone had > been using dgit from the start, then we would have that, but dgit’s > objectives are slightly different, so maybe my question could be posed > and answered separately. I had the same thought: - It's simple, easy to understand - No need for a separate tool - It's enough for many use cases > SNIP > * Every source package from snapshots.d.o becomes, extracted with >dpkg-source -x as usual, produces a git tree object. >I’d probably simply ignore empty directories. Please add a trailer line in the commit message that can be used as argument to mkdir -p to recreate the directoriesl > * Every source package also produces a git commit, with >- Tree: the above >- Author: top changelog entry >- Date: also top changelog entry >- Description summary: The version number >- Description text: The top changelog entry. >- Parents: This is the interesting bit > The set of parents should be the commits corresponding to any > version mentioned in debian/changelog, pruned by those that > are transitively reachable. > > This ensures that we get a nice git DAG for things like packages > that have been experimental for a while, merging from unstable > repeatedly. > > The order of parents could correspond to the order in > debian/changelog, so that the second changelog entry becomes > the first parent. Since you see the complication: Why not have no parents at all? Just have tags that point to orphan commits. One can still use diff. If you should need a branch for some reason (I doubt) than have a tool to order the tags with dpkg --compare-version and write new commits that form a branch. The tag name should be namespaced with the package name to allow different packages to coexist in one repo. Advantages: - No need to think about the right ordering - Problematic versions can be removed any time - Users can fetch just one specific tag without downloading other versions. This might make a difference with packages with lots of binary data. >These rules should, unless suddenly new historic packages appear, >ensure that we get identical git hashes if we re-run this tool, >which is goo. This is not an issue with my proposal above. > * Every suite (unstable, jessie...) becomes a branch, pointing to the >corresponding commit Just encode the suite in the tag name: PACKAGE/SUITE/VERSION: libcgi-application-plugin-authorization-perl/experimental/3:5.4_g424242+5-2bpo > * Optionally: One tag per version pointing to the corresponding >commit, for each version. Although maybe that would produce too >many tags... :-) signature.asc Description: This is a digitally signed message part.
git interface to snapshot.debian.org
Hi, this is a follow-up to my question after the dgit talk today: It would be great to have a git view of the a package’s history in Debian. There is some possible overlap with dgit in the sense that if everyone had been using dgit from the start, then we would have that, but dgit’s objectives are slightly different, so maybe my question could be posed and answered separately. There is precedent to what I want: http://hdiff.luite.com/ is a service that imports every Haskell package upload into a git repository, and provides a cgit interface to it. This has been very useful to me as a tool to investigate what has happened when, and to easily view diffs. Now snapshot.debian.org already contains all the data that should go into these git repositories. What would stop us from importing all of the sources packages into per-package git repositories? Given that it’s only source and there is compression, I would expect the resource usage to be acceptable. If the answer is „Nothing is stopping, just that someone has to do it“, then I’m volunteering, as long as I can do most of it during DebConf. Peter, what do you think? I probably do not need more than access to snapshot.debian.org and a directory there to work on. Technically, this is how I would do it: I phrase it terms of the git data model, and not in terms of the git command that reach that, as that gives a cleaner specification. * Every source package from snapshots.d.o becomes, extracted with dpkg-source -x as usual, produces a git tree object. I’d probably simply ignore empty directories. * Every source package also produces a git commit, with - Tree: the above - Author: top changelog entry - Date: also top changelog entry - Description summary: The version number - Description text: The top changelog entry. - Parents: This is the interesting bit The set of parents should be the commits corresponding to any version mentioned in debian/changelog, pruned by those that are transitively reachable. This ensures that we get a nice git DAG for things like packages that have been experimental for a while, merging from unstable repeatedly. The order of parents could correspond to the order in debian/changelog, so that the second changelog entry becomes the first parent. These rules should, unless suddenly new historic packages appear, ensure that we get identical git hashes if we re-run this tool, which is goo. * Every suite (unstable, jessie...) becomes a branch, pointing to the corresponding commit * Optionally: One tag per version pointing to the corresponding commit, for each version. Although maybe that would produce too many tags... Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata signature.asc Description: This is a digitally signed message part