Hi,
On Thu, Dec 15, 2022 at 12:38:11PM +0700, Arnaud Rebillout wrote:
> Package: git-buildpackage
> Version: 0.9.30
> Severity: normal
> User: de...@kali.org
> Usertags: origin-kali
> 
> Dear Maintainer,
> 
> In Kali Linux, we package an upstream that uses Git LFS to store a big
> file (a GeoIP database). The upstream is at:
> https://github.com/rsmusllp/king-phisher
> 
> In a previous version, upstream used to version the database "as is", it
> was a regular file in the Git repo. Then in a subsequent version, they
> switched to use Git LFS to store this file.
> 
> Gbp doesn't handle this transition well, apparently this is due to the
> combination of:
> * "gbp clone" disabling Git attributes (hence git lfs)
> * however "gbp import-orig" does no such thing
> 
> I'm the person who updated this package, so my local copy of
> king-phisher doesn't have the git attributes disabled, and everything
> works fine with me. However other folks who clone the repo complain, as
> it leads to an unclean git checkout, and I don't know what's the way
> forward.
> 
> For a longer (and hopefully crystal-clear) explanation of the issue, I
> prepared a Git repo and a walkthrough to reproduce the issue. There we
> go :)
> 
> Let's first clone the king-phisher package *before* upstream switched to
> Git LFS:
> 
>   $ gbp clone https://gitlab.com/arnaudr/king-phisher.git
>   $ cd king-phisher
>   $ cat .gitattributes
>   cat: .gitattributes: No such file or directory
>   $ ls -l data/server/king_phisher/GeoLite2-City.mmdb 
>   -rw-r--r-- 1 arno arno 61615395 Dec 15 11:53 
> data/server/king_phisher/GeoLite2-City.mmdb
> 
> So at this point, the file GeoLite2-City.mmdb is versioned "as is", it
> is a regular file.
> 
> Now let's update the package to latest Git snapshot:
> 
>   $ gbp import-orig --uscan
>   gbp:info: Launching uscan...
>   Downloading data/server/king_phisher/GeoLite2-City.mmdb (62 MB)
>   gbp:info: Using uscan downloaded tarball 
> ../king-phisher_1.15.0+git20221107.orig.tar.xz
>   What is the upstream version? [1.15.0+git20221107] 
>   gbp:info: Importing '../king-phisher_1.15.0+git20221107.orig.tar.xz' to 
> branch 'upstream'...
>   gbp:info: Source package is king-phisher
>   gbp:info: Upstream version is 1.15.0+git20221107
>   gbp:info: Replacing upstream source on 'kali/master'
>   gbp:info: Successfully imported version 1.15.0+git20221107 of 
> ../king-phisher_1.15.0+git20221107.orig.tar.xz
> 
> The line "Downloading data/server/king_phisher/GeoLite2-City.mmdb (62
> MB" comes from git lfs, which is downloading the file. And here's the
> situation now:
> 
>   $ cat .gitattributes 
>   *.mmdb filter=lfs diff=lfs merge=lfs -text
>   $ cat .git/info/attributes
>   cat: .git/info/attributes: No such file or directory
>   $ ls -l data/server/king_phisher/GeoLite2-City.mmdb
>   -rw-r--r-- 1 arno arno 61615395 Dec 15 11:56 
> data/server/king_phisher/GeoLite2-City.mmdb
> 
> So we can see the git lfs thinggy, and we can see that
> .git/info/attributes' doesn't exist (more on that below).
> 
> Let's push that work (I prepared a fork to push changes):
> 
>  $ git remote add arnaudr2 g...@gitlab.com:arnaudr/king-phisher2.git
>  $ git push arnaudr2 : --follow-tags
>   Locking support detected on remote "arnaudr2". Consider enabling it with:
>     $ git config 
> lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
>   Locking support detected on remote "arnaudr2". Consider enabling it with:
>     $ git config 
> lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
>   Locking support detected on remote "arnaudr2". Consider enabling it with:
>     $ git config 
> lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
>   Locking support detected on remote "arnaudr2". Consider enabling it with:
>     $ git config 
> lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
>   Uploading LFS objects: 100% (1/1), 62 MB | 3.4 MB/s, done.
>   Enumerating objects: 112, done.
>   Counting objects: 100% (82/82), done.
>   Delta compression using up to 8 threads
>   Compressing objects: 100% (46/46), done.
>   Writing objects: 100% (49/49), 19.06 KiB | 19.06 MiB/s, done.
>   Total 49 (delta 29), reused 5 (delta 0), pack-reused 0
>   remote:
>   remote: To create a merge request for pristine-tar, visit:
>   remote:   
> https://gitlab.com/arnaudr/king-phisher2/-/merge_requests/new?merge_request%5Bsource_branch%5D=pristine-tar
>   remote:
>   remote:
>   remote: To create a merge request for upstream, visit:
>   remote:   
> https://gitlab.com/arnaudr/king-phisher2/-/merge_requests/new?merge_request%5Bsource_branch%5D=upstream
>   remote:
>   To gitlab.com:arnaudr/king-phisher2.git
>      c5db68b..dbf4ce7  kali/master -> kali/master
>      d9ec6a5..e4e9390  pristine-tar -> pristine-tar
>      be63910..f4f0fae  upstream -> upstream
>    * [new tag]         upstream/1.15.0+git20221107 -> 
> upstream/1.15.0+git20221107
> 
> And now, the issue: when we clone this repo with gbp, the resulting repo
> is not clean. Let's try:
> 
>   $ gbp clone -v g...@gitlab.com:arnaudr/king-phisher2.git
>   gbp:debug: ['git', 'rev-parse', '--show-cdup']
>   gbp:info: Cloning from 'g...@gitlab.com:arnaudr/king-phisher2.git'
>   gbp:debug: ['git', 'clone', '--quiet', 
> 'g...@gitlab.com:arnaudr/king-phisher2.git']
>   gbp:debug: ['git', 'rev-parse', '--show-cdup']
>   gbp:debug: ['git', 'rev-parse', '--is-bare-repository']
>   gbp:debug: ['git', 'rev-parse', '--git-dir']
>   gbp:debug: ['git', 'rev-parse', '--show-cdup']
>   gbp:debug: ['git', 'rev-parse', '--is-bare-repository']
>   gbp:debug: ['git', 'rev-parse', '--git-dir']
>   gbp:debug: Will track branches: ['kali/master', 'upstream', 'pristine-tar']
>   gbp:debug: ['git', 'show-ref', '--verify', 
> 'refs/remotes/origin/kali/master']
>   gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/kali/master']
>   gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/origin/upstream']
>   gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/upstream']
>   gbp:debug: ['git', 'branch', 'upstream', 'origin/upstream']
>   gbp:debug: ['git', 'show-ref', '--verify', 
> 'refs/remotes/origin/pristine-tar']
>   gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/pristine-tar']
>   gbp:debug: ['git', 'branch', 'pristine-tar', 'origin/pristine-tar']
>   gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/kali/master']
>   gbp:debug: ['git', 'config', 'user.name', 'Arnaud Rebillout']
>   gbp:debug: ['git', 'config', 'user.email', 'arna...@kali.org']
>   gbp:debug: ['git', 'ls-tree', '-z', '-r', '-l', 'HEAD', '--']
>   gbp:debug: Found non-empty .gitattributes: b'.gitattributes'
>   gbp:debug: Configuring Git attributes
>   
>   $ cd king-phisher2
>   
>   $ git status
>   On branch kali/master
>   Your branch is up to date with 'origin/kali/master'.
>   
>   Changes not staged for commit:
>     (use "git add <file>..." to update what will be committed)
>     (use "git restore <file>..." to discard changes in working directory)
>       modified:   data/server/king_phisher/GeoLite2-City.mmdb
>   
>   no changes added to commit (use "git add" and/or "git commit -a")
>   
>   $ cat .gitattributes 
>   *.mmdb filter=lfs diff=lfs merge=lfs -text
>   $ cat .git/info/attributes 
>   # Added by git-buildpackage to disable .gitattributes found in the upstream 
> tree
>   [attr]dgit-defuse-attrs  -text -eol -crlf -ident -filter 
> -working-tree-encoding
>   * -export-ignore
>   * dgit-defuse-attrs
>   $ ls -l data/server/king_phisher/GeoLite2-City.mmdb 
>   -rw-r--r-- 1 arno arno 61615395 Dec 15 12:12 
> data/server/king_phisher/GeoLite2-City.mmdb
>   
> As we can see above (my interpretation):
> * during the 'gbp clone' step, the 'git clone' command will actually
>   trigger git lfs, and download the GeoLite2 database (assuming you have
>   the package git-lfs installed on your machine).
> * then at the end of the gbp clone operation, we can see "Configuring
>   Git attributes", and this is when gbp creates the file
>   .git/info/attributes
> * as a result, the git repo is in an unclean state
> 
> To bring back the Git repo in shape, we can either:
> 
> 1) Undo what gbp just did:
> 
>     rm -fr .git/info/attributes
> 
> 2) Undo what git lfs did:
> 
>     $ git checkout data/server/king_phisher/GeoLite2-City.mmdb
>     Updated 1 path from the index
>     $ cat data/server/king_phisher/GeoLite2-City.mmdb
>     version https://git-lfs.github.com/spec/v1
>     oid 
> sha256:a253d9cd68fe17b00087da24375f31f07cd4bb3852dc5fe3afe37b8f59e5abd0
>     size 61615395

Or use `gbp clone --git-defuse-attributes=off ...` ?

Cheers,
 -- Guido

> 
> As we can see with option 2), the LFS file becomes a short metadata
> file, because that's what's really in the Git repo, before "git lfs"
> replaces it with the "real file" that it fetches from somewhere else.
> 
>   == Questions
> 
> How does the git LFS files should be handled? When "gbp clone" disables
> the gitattributes, it disables Git LFS in turn: is it intended, or not?
> Does gbp has an opinion on that?  In any case, it seems that disabling
> the gitattributes after 'git clone' has run is too late, because the Git
> LFS objects were already fetched.
> 
> Thanks for reading, and please help me understand how we should handle
> those LFS files.
> 
> Arnaud
> 

Reply via email to