On 2019-08-11 01:55, Drew Parsons wrote:
Upstreams are starting to use git lfs in their git repos.  In some
cases the git-lfs references files are retained in the source tarball,
not replacing the reference with the actual files.  This happens for
instance with github repos (I gather it happens because the tarball is
generated with 'git archive' [1]).  An example is the mesh files [2]
in pygalmesh 0.4.0 [3].

This means gbp import-orig, used in the normal way (or "old" way) with
orig tarballs, will import lfs references, and dpkg-buildpackage will
proceed to attempt the build with those references files, not with the
actual files.  So the build fails.

Actually 'gbp import-orig --pristine-tar' also fails, e.g.,

  pygalmesh$ gbp import-orig --uscan --pristine-tar
...
  Error downloading object: test/meshes/elephant.vtu (a11aa57): Smudge
error: Error downloading test/meshes/elephant.vtu
(a11aa572d612abfacace9a31c0e20c8a628bf4ffd50b1661e14790ae02c93b7b):
    [a11aa572d612abfacace9a31c0e20c8a628bf4ffd50b1661e14790ae02c93b7b]
Object does not exist on the server or you don't have permissions to
access it
...
So what is currently the best way to handle lfs files in upstream
tarballs?  Is gbp-buildpackage the only automated solution and those
not using it just need to learn how?  Or can debian/watch be written
to include a git-lfs pull when repacking a source tarball?  Any other
solutions apart from manual repacking?


I can answer part of my own question now, but it reinforces the other part: how should we handle LFS files?

As far as the source tarball and pristine-tar goes, that can be fixed by changing debian/watch to track upstream git tags rather than upstream release tarballs, e.g.

version=4
     opts="mode=git, gitmode=full, pgpmode=none" \
     https://github.com/nschloe/pygalmesh.git \
     refs/tags/v([\d\.]+) debian uupdate

The tarball pulled this way by 'gbp import-orig --uscan --pristine-tar' pulls in the actual LFS files instead of their references.


But this brings us back to the reason why upstream started using LFS in the first place. With this import-orig, the Debian git repos on salsa will be carrying the large data files that upstream is trying to manage with git-lfs. In the case of pygalmesh, for instance, the mesh file test/meshes/liver.inr is 49MB in size.

Do we really want to be carrying large upstream data files in salsa, when upstream has judged they should be handled using git-lfs infrastructure?

We have the policy of not pulling files from the internet at build time, which is probably a good policy to maintain. Does it mean we want to configure salsa or some other part of debian infrastructure to provide a git-lfs service? Maybe salsa can already do it, in which case a HOWTO on wiki.debian.org would be useful.

Drew

Reply via email to