Re: git interface to snapshot.debian.org

2015-08-26 Thread Ian Jackson
Joachim Breitner writes ("Re: git interface to snapshot.debian.org"):
> Am Dienstag, den 25.08.2015, 21:47 +0100 schrieb Ian Jackson:
> > (Although if a .dsc migrates between suites, the git history
> > is updated.)
> 
> I don’t understand that. Is there really git history changed? Or just
> branches updated?

The branch appears to update (when viewed with dgit fetch, although it
isn't updated on the server).  However there may have to be a commit
made to make the new tip of that branch fast forwarding from the old
tip of the same branch.

(As I write this I have second thoughts about whether this works
properly in all cases; /me makes a note to check.)

Ian.



Re: git interface to snapshot.debian.org

2015-08-25 Thread Joachim Breitner
Hi,

Am Dienstag, den 25.08.2015, 21:47 +0100 schrieb Ian Jackson:
> (Although if a .dsc migrates between suites, the git history
> is updated.)

I don’t understand that. Is there really git history changed? Or just
branches updated?



Greetings,
Joachim
-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: git interface to snapshot.debian.org

2015-08-25 Thread Ian Jackson
Joachim Breitner writes ("Re: git interface to snapshot.debian.org"):
> Am Dienstag, den 25.08.2015, 13:59 +0100 schrieb Ian Jackson:
> > > If the answer is „Nothing is stopping, just that someone has to do it“,
> > > then I’m volunteering, as long as I can do most of it during DebConf.
> > 
> > There are two problems that are stopping us doing this right away:
> > 
> >   - Maybe the amount of data is too big to suddenly dump in the dgit
> > git server (we should talk to DSA)
> 
> as mentioned I created a proof-of-concept bash script, and for example
> the (git gc’ed) repository of all history of ghc is 137MB. screen
> -message, as an example for a small package, amounts to 572KB. Not sure
> how to best extrapolate that, though.

Right.  I can't see how to do it without actually trying it on the
whole archive.

I guess we could run a program that did this for each package, noted
the size, and then threw the resulting git branch away.  That would
use up some computer time and elapsed time but wouldn't require an
enormous scratch area.

> > > >- Parents: This is the interesting bit
> > > >  The set of parents should be the commits corresponding to any
> > > >  version mentioned in debian/changelog, pruned by those that
> > > >  are transitively reachable.
> > > 
> > > Nice idea.
> 
> At least for GHC, which had independently running branches in unstable
> and experimental for a while, with occasional merges from unstable to
> experimental, this worked fine.
> 
> I guess dgit by itself does not do anything like that, but rather
> expects the right ancestry to come out of the „normal“ git use of the
> maintianer.

Indeed.  (Although if a .dsc migrates between suites, the git history
is updated.)

> Anyways, I postponed this project for now; too much other things going
> on. I might get back to it in the future. In that case, I would
> probably first try to get nice git repositories from all of
> snapshot.d.o, independent of dgit. Once we have that, one can see if
> and how that is best integrated with dgit.

OK.

> (If you or someone else beats me to it: Even better :-))

Heh.

Thanks,
Ian.



Re: git interface to snapshot.debian.org

2015-08-25 Thread Joachim Breitner
Hi,

Am Dienstag, den 25.08.2015, 13:59 +0100 schrieb Ian Jackson:
> > If the answer is „Nothing is stopping, just that someone has to do it“,
> > then I’m volunteering, as long as I can do most of it during DebConf.
> 
> There are two problems that are stopping us doing this right away:
> 
>   - Maybe the amount of data is too big to suddenly dump in the dgit
> git server (we should talk to DSA)


as mentioned I created a proof-of-concept bash script, and for example
the (git gc’ed) repository of all history of ghc is 137MB. screen
-message, as an example for a small package, amounts to 572KB. Not sure
how to best extrapolate that, though.


> > >- Parents: This is the interesting bit
> > >  The set of parents should be the commits corresponding to any
> > >  version mentioned in debian/changelog, pruned by those that
> > >  are transitively reachable.
> > 
> > Nice idea.

At least for GHC, which had independently running branches in unstable
and experimental for a while, with occasional merges from unstable to
experimental, this worked fine.

I guess dgit by itself does not do anything like that, but rather
expects the right ancestry to come out of the „normal“ git use of the
maintianer.



Anyways, I postponed this project for now; too much other things going
on. I might get back to it in the future. In that case, I would
probably first try to get nice git repositories from all of
snapshot.d.o, independent of dgit. Once we have that, one can see if
and how that is best integrated with dgit.

(If you or someone else beats me to it: Even better :-))


Greetings,
Joachim


-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: git interface to snapshot.debian.org

2015-08-25 Thread Peter Palfrader
[ Added d-a@ldo for the dsa parts. ]

On Tue, 25 Aug 2015, Ian Jackson wrote:

> > If the answer is „Nothing is stopping, just that someone has to do it“,
> > then I’m volunteering, as long as I can do most of it during DebConf.
> 
> There are two problems that are stopping us doing this right away:
> 
>   - Maybe the amount of data is too big to suddenly dump in the dgit
> git server (we should talk to DSA)

Do you have an estimate on the resources required?  I'm sure we can
figure something out.

Cheers,
-- 
|  .''`.   ** Debian **
  Peter Palfrader   | : :' :  The  universal
 https://www.palfrader.org/ | `. `'  Operating System
|   `-https://www.debian.org/



Re: git interface to snapshot.debian.org

2015-08-25 Thread Ian Jackson
Joachim Breitner writes ("git interface to snapshot.debian.org"):
> this is a follow-up to my question after the dgit talk today: It would
> be great to have a git view of the a package’s history in Debian. There
> is some possible overlap with dgit in the sense that if everyone had
> been using dgit from the start, then we would have that, but dgit’s
> objectives are slightly different, so maybe my question could be posed
> and answered separately.

Hi.  I'm sorry that we didn't manage to talk about this idea of yours
properly at DC.  I think it is a good idea.

> If the answer is „Nothing is stopping, just that someone has to do it“,
> then I’m volunteering, as long as I can do most of it during DebConf.

There are two problems that are stopping us doing this right away:

  - Maybe the amount of data is too big to suddenly dump in the dgit
git server (we should talk to DSA)

  - There is nothing which currently automatically updates the
server-side dgit history when non-dgit uploads are made.  Such a
thing could be produced, but I think it is essential to have it
before we embark on a historical import.

There are some things that you would need my help (as admin of
dgit.d.o) with:

  - your histories would have to be stitched into the dgit git history
for packages with existing dgit history

  - your ref updates can't be done by DDs in general, because the dgit
git server only accepts updates made with dgit push.  So you would
need me with my admin hat on to do the updates.


Your details seem largely sound:

>  * Every source package from snapshots.d.o becomes, extracted with 
>dpkg-source -x as usual, produces a git tree object.
>I’d probably simply ignore empty directories.

Right.

>- Parents: This is the interesting bit
>  The set of parents should be the commits corresponding to any 
>  version mentioned in debian/changelog, pruned by those that
>  are transitively reachable.

Nice idea.

>  * Every suite (unstable, jessie...) becomes a branch, pointing to the
>corresponding commit

s/unstable/sid/, but yes.

>  * Optionally: One tag per version pointing to the corresponding
>commit, for each version. Although maybe that would produce too
>many tags...

We definitely want this.  The tag should be in DEP-14 format, which
makes it identical to existing dgit git tags.

Thanks,
Ian.



Re: git interface to snapshot.debian.org

2015-08-20 Thread Joachim Breitner
Hi,

Am Donnerstag, den 20.08.2015, 11:03 +0200 schrieb Marco d'Itri:
> I have not thought of recreating the history of the upstream
> > versions
> > correctly. I mostly care about the “what was in Debian” aspect of
> > history, but it should not hurt to have the upstream branches as well.
> I am not sure then that I see how these snapshot would be more useful 
> than just downloading the source packages.
> If you are just looking at adding a git interface to snapshot.d.o. then 
> maybe git-annex would be more appropriate while using much less 
> resources?

you can do things like "git blame", or easily run
"git diff 0.1..21.0.1-1~bpo", or whatever nice things you can do with
your git foo. And the git data store means you can fetch the whole
history more efficiently than running "debsnap" on all versions.

Greetings,
Joachim

-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: git interface to snapshot.debian.org

2015-08-20 Thread chrysn
On Thu, Aug 20, 2015 at 11:03:45AM +0200, Marco d'Itri wrote:
> If you are just looking at adding a git interface to snapshot.d.o. then 
> maybe git-annex would be more appropriate while using much less 
> resources?

This is primarily targeted at serving as a common starting point for
packages that enter into the dgit workflow, which aims at having
something git-cloneable that is equivalent to the source package.

Unless there is a view of the package's history available, dgit would
(and currently does) start off with whichever state the package is in in
the archive at the point in time when dgit is first used with it.

chrysn


signature.asc
Description: Digital signature


Re: git interface to snapshot.debian.org

2015-08-20 Thread Marco d'Itri
On Aug 20, Joachim Breitner  wrote:

> I have not thought of recreating the history of the upstream versions
> correctly. I mostly care about the “what was in Debian” aspect of
> history, but it should not hurt to have the upstream branches as well.
I am not sure then that I see how these snapshot would be more useful 
than just downloading the source packages.
If you are just looking at adding a git interface to snapshot.d.o. then 
maybe git-annex would be more appropriate while using much less 
resources?

-- 
ciao,
Marco


pgp0iiuVNwoHS.pgp
Description: PGP signature


Re: git interface to snapshot.debian.org

2015-08-20 Thread Joachim Breitner
Hi,

Am Donnerstag, den 20.08.2015, 05:40 +0200 schrieb Marco d'Itri:
> On Aug 18, Joachim Breitner  wrote:
> 
> > this is a follow-up to my question after the dgit talk today: It would
> > be great to have a git view of the a package’s history in Debian. There
>
> I have spent quite a lot of time in 2014 to figure out how to 
> automatically import in git repositories over 15 years of the history of 
> my packages, with upstream sources and Debian diffs properly merged and 
> tags for everything. Have a look at:
> 
> http://www.linux.it/~md/software/import-inn2.sh
> http://anonscm.debian.org/gitweb/?p=users/md/inn2.git

I have not thought of recreating the history of the upstream versions
correctly. I mostly care about the “what was in Debian” aspect of
history, but it should not hurt to have the upstream branches as well.
There might be tricky corner cases when trying to figure out what the
parent(s) of an upstream commit should be.

Anyways, although I have a bash prototype that worked fine so far, I’m
moving this project further down my queue of nice things I want to do,
as there are more pressing tasks on my plate. 
> Also: is there any point in continuing to automatically update these 
> repositories for the packages which nowadays have useful git 
> repositories?

The benefit of having git repos on snaptshot.debian.org is that they
would be uniform, i.e. not dependent on the maintainer doing anything
in any particular way, which can be useful when you build tools on top
of that.

Greetings,
Joachim
-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: git interface to snapshot.debian.org

2015-08-19 Thread Marco d'Itri
On Aug 18, Joachim Breitner  wrote:

> this is a follow-up to my question after the dgit talk today: It would
> be great to have a git view of the a package’s history in Debian. There
I have spent quite a lot of time in 2014 to figure out how to 
automatically import in git repositories over 15 years of the history of 
my packages, with upstream sources and Debian diffs properly merged and 
tags for everything. Have a look at:

http://www.linux.it/~md/software/import-inn2.sh
http://anonscm.debian.org/gitweb/?p=users/md/inn2.git

A significant issue, which I decided to not solve, is how to handle the 
upstream tree of old packages which use DBS.
(And while I have been able to recover some releases which are not in 
snaphost.d.o, sadly some very old ones have been lost forever.)

Also: is there any point in continuing to automatically update these 
repositories for the packages which nowadays have useful git 
repositories?

-- 
ciao,
Marco


pgpXPXGyuG7Jm.pgp
Description: PGP signature


Re: git interface to snapshot.debian.org

2015-08-18 Thread Joachim Breitner
Hi,


Am Dienstag, den 18.08.2015, 15:36 +0200 schrieb Joachim Breitner:
> I really like to have a DAG that represents the history of the 
> package, and I don’t think it is a big complication. The bug tracker 
> does it also this way, I believe, to produce these nice graphs in the 
> corner.

attached is the result of my prototype. Not too complicated after all.

Greetings,
Joachim
-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: git interface to snapshot.debian.org

2015-08-18 Thread Joachim Breitner
[Dropping Peter from CC]

Hi,


Am Dienstag, den 18.08.2015, 15:22 +0200 schrieb Thomas Koch:
> 
> >  * Every source package from snapshots.d.o becomes, extracted with
> >dpkg-source -x as usual, produces a git tree object.
> >I’d probably simply ignore empty directories.
> Please add a trailer line in the commit message that can be used as argument 
> to mkdir -p to recreate the directoriesl

My current code creates a tree object tagged with the version number,
and defers the creation of the commit objects till later, without
access to the unpacked source, so this feature requires a way to carry
over that information.

I could ofcourse force an empty directory into the tree object,
following 
http://stackoverflow.com/questions/11600871/git-repo-contains-an-empty-directory-what-happens/11600882#11600882
unfortunately git does not cope well with that.

I’ll think about it.

> >  * Every source package also produces a git commit, with
> >- Tree: the above
> >- Author: top changelog entry
> >- Date: also top changelog entry
> >- Description summary: The version number
> >- Description text: The top changelog entry.
> >- Parents: This is the interesting bit
> >  The set of parents should be the commits corresponding to any
> >  version mentioned in debian/changelog, pruned by those that
> >  are transitively reachable.
> > 
> >  This ensures that we get a nice git DAG for things like packages
> >  that have been experimental for a while, merging from unstable
> >  repeatedly.
> > 
> >  The order of parents could correspond to the order in
> >  debian/changelog, so that the second changelog entry becomes
> >  the first parent.
> 
> Since you see the complication: Why not have no parents at all? Just have 
> tags 
> that point to orphan commits. One can still use diff. If you should need a 
> branch for some reason (I doubt) than have a tool to order the tags with dpkg 
> --compare-version and write new commits that form a branch.

I really like to have a DAG that represents the history of the package,
and I don’t think it is a big complication. The bug tracker does it
also this way, I believe, to produce these nice graphs in the corner. 


Also, "git merge-base --independent" does the heavy lifting of this logic for 
us.


> Advantages:
> - No need to think about the right ordering

I’m willing to think :-)

> - Problematic versions can be removed any time

I’m willing to have the commit id changes in this case.

> - Users can fetch just one specific tag without downloading other versions. 
> This might make a difference with packages with lots of binary data.

You can still use "git clone --shallow", can’t you?

> >  * Every suite (unstable, jessie...) becomes a branch, pointing to the
> >corresponding commit
> Just encode the suite in the tag name: PACKAGE/SUITE/VERSION:
> libcgi-application-plugin-authorization-perl/experimental/3:5.4_g424242+5-2bpo

That does not look right: Either I want the version in experimental, or
I want a specific version.

Also, how do you get backports versions into experimental? ;-)

Greetings,
Joachim
-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: git interface to snapshot.debian.org

2015-08-18 Thread Thomas Koch
On Tuesday 18 August 2015 11:15:17 Joachim Breitner wrote:
> Hi,
> 
> this is a follow-up to my question after the dgit talk today: It would
> be great to have a git view of the a package’s history in Debian. There
> is some possible overlap with dgit in the sense that if everyone had
> been using dgit from the start, then we would have that, but dgit’s
> objectives are slightly different, so maybe my question could be posed
> and answered separately.
I had the same thought:
- It's simple, easy to understand
- No need for a separate tool
- It's enough for many use cases

>  SNIP

>  * Every source package from snapshots.d.o becomes, extracted with
>dpkg-source -x as usual, produces a git tree object.
>I’d probably simply ignore empty directories.
Please add a trailer line in the commit message that can be used as argument 
to mkdir -p to recreate the directoriesl
>  * Every source package also produces a git commit, with
>- Tree: the above
>- Author: top changelog entry
>- Date: also top changelog entry
>- Description summary: The version number
>- Description text: The top changelog entry.
>- Parents: This is the interesting bit
>  The set of parents should be the commits corresponding to any
>  version mentioned in debian/changelog, pruned by those that
>  are transitively reachable.
> 
>  This ensures that we get a nice git DAG for things like packages
>  that have been experimental for a while, merging from unstable
>  repeatedly.
> 
>  The order of parents could correspond to the order in
>  debian/changelog, so that the second changelog entry becomes
>  the first parent.
Since you see the complication: Why not have no parents at all? Just have tags 
that point to orphan commits. One can still use diff. If you should need a 
branch for some reason (I doubt) than have a tool to order the tags with dpkg 
--compare-version and write new commits that form a branch.

The tag name should be namespaced with the package name to allow different 
packages to coexist in one repo.

Advantages:
- No need to think about the right ordering
- Problematic versions can be removed any time
- Users can fetch just one specific tag without downloading other versions. 
This might make a difference with packages with lots of binary data.

>These rules should, unless suddenly new historic packages appear,
>ensure that we get identical git hashes if we re-run this tool,
>which is goo.
This is not an issue with my proposal above.

>  * Every suite (unstable, jessie...) becomes a branch, pointing to the
>corresponding commit
Just encode the suite in the tag name: PACKAGE/SUITE/VERSION:
libcgi-application-plugin-authorization-perl/experimental/3:5.4_g424242+5-2bpo

>  * Optionally: One tag per version pointing to the corresponding
>commit, for each version. Although maybe that would produce too
>many tags...
:-)

signature.asc
Description: This is a digitally signed message part.


git interface to snapshot.debian.org

2015-08-18 Thread Joachim Breitner
Hi,

this is a follow-up to my question after the dgit talk today: It would
be great to have a git view of the a package’s history in Debian. There
is some possible overlap with dgit in the sense that if everyone had
been using dgit from the start, then we would have that, but dgit’s
objectives are slightly different, so maybe my question could be posed
and answered separately.

There is precedent to what I want: http://hdiff.luite.com/ is a service
that imports every Haskell package upload into a git repository, and
provides a cgit interface to it. This has been very useful to me as a
tool to investigate what has happened when, and to easily view diffs.

Now snapshot.debian.org already contains all the data that should go
into these git repositories. What would stop us from importing all of
the sources packages into per-package git repositories?
Given that it’s only source and there is compression, I would expect
the resource usage to be acceptable.

If the answer is „Nothing is stopping, just that someone has to do it“,
then I’m volunteering, as long as I can do most of it during DebConf.
Peter, what do you think? I probably do not need more than access to
snapshot.debian.org and a directory there to work on.


Technically, this is how I would do it:
I phrase it terms of the git data model, and not in terms of the git
command that reach that, as that gives a cleaner specification.

 * Every source package from snapshots.d.o becomes, extracted with 
   dpkg-source -x as usual, produces a git tree object.
   I’d probably simply ignore empty directories.
 * Every source package also produces a git commit, with
   - Tree: the above
   - Author: top changelog entry
   - Date: also top changelog entry
   - Description summary: The version number
   - Description text: The top changelog entry.
   - Parents: This is the interesting bit
 The set of parents should be the commits corresponding to any 
 version mentioned in debian/changelog, pruned by those that
 are transitively reachable.
  
 This ensures that we get a nice git DAG for things like packages 
 that have been experimental for a while, merging from unstable
 repeatedly.

 The order of parents could correspond to the order in 
 debian/changelog, so that the second changelog entry becomes
 the first parent.

   These rules should, unless suddenly new historic packages appear, 
   ensure that we get identical git hashes if we re-run this tool, 
   which is goo.
 * Every suite (unstable, jessie...) becomes a branch, pointing to the
   corresponding commit
 * Optionally: One tag per version pointing to the corresponding
   commit, for each version. Although maybe that would produce too
   many tags...


Greetings,
Joachim


-- 
Joachim "nomeata" Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part