The "packref" support uses symlinks to "piggy-back" use of packs into
the pack namespace of the consumer from the originator (reference repo).

We use symlinks, so that mirror tarballs won't duplicate the referenced
data, as hardlinks would do.  However, that means we also assume that
the pack namespace of the originator remains constant.

What if one uses a reference from a mirror tarball, then disables
mirrors, does a "cleanall" on the reference for some obscure reason, and
then re-clones it from a server directly.  Will the pack links from the
consumer into the reference still point to a pack or be left dangling?

For simplicity, we temporarily ignored this assumption so that we could
deal with it (and this corner case related to it) separately - which we
do now.  To proceed, we first need to know a bit about pack names.

Most importantly, as you can see, pack names don't really matter at all:

   paul@hackbox:~$ git clone --bare git://git.yoctoproject.org/poky
   Cloning into bare repository 'poky.git'...
     [...]
   Resolving deltas: 100% (381097/381097), done.
   paul@hackbox:~$ cd poky.git/objects/pack/
   paul@hackbox:~/poky.git/objects/pack$ ls -l
   total 191100
   -r--r--r-- 1 paul paul  14250608 Mar 28 12:09 
pack-db459c1574c5410a1002d8098745daee3a59599f.idx
   -r--r--r-- 1 paul paul 181429684 Mar 28 12:09 
pack-db459c1574c5410a1002d8098745daee3a59599f.pack
   paul@hackbox:~/poky.git/objects/pack$ mv 
pack-db459c1574c5410a1002d8098745daee3a59599f.idx woot-fred-was-here-31337.idx
   paul@hackbox:~/poky.git/objects/pack$ mv 
pack-db459c1574c5410a1002d8098745daee3a59599f.pack woot-fred-was-here-31337.pack
   paul@hackbox:~/poky.git/objects/pack$ cd ../../
   paul@hackbox:~/poky.git$ git fsck
   Checking object directories: 100% (256/256), done.
   Checking objects: 100% (508912/508912), done.
   Checking connectivity: 508912, done.
   paul@hackbox:~/poky.git$ git show HEAD | head -n3
   commit a7e1bbaf6d7c5d1cf44069419860dabd78c02eec
   Author: Khem Raj <raj.k...@gmail.com>
   Date:   Thu Mar 25 18:11:07 2021 -0700
   paul@hackbox:~/poky.git$ echo $?
   0

This is because normal flow will have a clone/fetch feed stdin of
index-pack, and from the "--stdin" part of its manpage, we have:

   If <pack-file> is not specified, the pack is written to objects/pack/
   directory of the current Git repository with a default name determined
   from the pack content.

Two things to note: 1) we have always been free to choose the pack name
even if the average user probably always takes the default, and 2) the
man page is intentionally kind of vague on how that default is created.

We need to look at #2 in detail, in order to answer our question about
whether the pack names remain constant for the same object content.

The executive summary is "yes" for git v1.7 and older and "no" for git
v1.8+   So we need to look at both in order to decide what to do.

For the "old" git, the default pack name was simply "pack-" + the SHA1
of the sorted SHA1 object names themselves.  If you have an old pack
around you can confirm as I have done below:

   $ ls -l
   total 792
   -r--r--r-- 1 paul paul   1632 Feb 13  2013 
pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.idx
   -r--r--r-- 1 paul paul 803171 Feb 13  2013 
pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.pack
   $ git show-index < pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.idx | cut 
-d " " -f 2 | xxd -r -p | sha1sum
   a9f793e9f46c138a27abb326ac68cb6e6397d0f0  -
   $ git verify-pack -v pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.pack|grep 
'^[0-9a-f]\{40\}' | cut -d " " -f1  | sort | xxd -r -p | sha1sum
   a9f793e9f46c138a27abb326ac68cb6e6397d0f0  -

In both cases, we recover the pack name directly from the objects. There
is no extraneous information, like date, servername, etc. used in the
pack name.  And so it is invariant upon reclone/repack, even if the
compression level was changed - since the final result from "git
unpack-objects" is the same.

For newer git, which contain 1190a1ac[1] the choice was made to use the
SHA1 of the data as stored in the pack itself - the "trailer" SHA1.
If we look at our since-renamed pack friend Fred above:

   paul@hackbox:~/poky.git/objects/pack$ hexdump -C 
woot-fred-was-here-31337.pack  | tail -n3
   0ad065a0  db 45 9c 15 74 c5 41 0a  10 02 d8 09 87 45 da ee
   0ad065b0  3a 59 59 9f

...we can see we really didn't "lose" his original pack name, as the
trailer SHA1 db459c1574c5410a1002d8098745daee3a59599f is right there.

But this means that the pack name is based on transient data like
compression artifacts, and that no two clones will ever likely generate
the same pack name.  If this was a letter in an envelope, the SHA1 went
from being the contents of the letter, to also include how many times
the paper was folded, and the size and colour of the envelope.

This change was made to remove any possible confusion that pack name
equivalence meant binary byte equivalence across two packs, by virtually
ensuring we never got the same pack name twice.

However, in our case, a pack name that is essentially a random number is
unhelpful, and the namespace collision concerns of commit 1190a1ac are
even less of a concern for static content repositories.

So our static repos use pre v1.8 pack names, which reflect functional
equivalence between packs from clone to clone to mirror tarball snapshot
of a clone, because as demonstrated earlier, we are free to choose the
pack name(s) completely at will, so long as we keep it paired with its
index file.

This turns out to be much more robust than trying to preserve the
now random "new" pack names through self-updating symlinks based on
dependencies or anon python or anything similarly complex.

As a final optimization/convenience, we can simply drop the "xxd" from
the pipeline - and avoid introducing new native dependencies, or adding
it to the ASSUME_PROVIDED command list and hoping it is everywhere.

We can do this because the ASCII representation of the .idx hex values
contains the exact same pack object content information as the xxd
binary-converted version, but just in a less dense format.  So the SHA1
of that less packed info stream still gives us a unique signature of the
objects contained in the pack - just not the exact one the older git did.

[1] https://github.com/git/git/commit/1190a1ac

Signed-off-by: Paul Gortmaker <paul.gortma...@windriver.com>
---
 bitbake/lib/bb/fetch2/git.py | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
index 8aec12df2bf8..2d693083cac4 100644
--- a/bitbake/lib/bb/fetch2/git.py
+++ b/bitbake/lib/bb/fetch2/git.py
@@ -386,6 +386,21 @@ class Git(FetchMethod):
         static = self.get_git_config(ud, d, repo, "bitbake.static")
         return (static == "true")
 
+    def rename_packs(self, ud, d, repo):
+        # Use git pack naming similar to pre v1.8 - where the name only
+        # depends on the objects within, not objs + compression artifacts.
+        pkdir = os.path.join(repo, 'objects', 'pack')
+
+        for idx in fnmatch.filter(os.listdir(pkdir), "*.idx"):
+            pk = os.path.join(pkdir, idx[:-3] + "pack")
+            idx = os.path.join(pkdir, idx)
+            # insert "xxd -p -r" before the sha1sum to get exactly v1.7 names.
+            cmd = "%s show-index < %s | cut -d' ' -f2 | sort | sha1sum" % 
(ud.basecmd, idx)
+            output = runfetchcmd(cmd, d, workdir=repo)
+            newname = os.path.join(pkdir, "pack-" + output[:40])
+            os.rename(pk, newname + ".pack")
+            os.rename(idx, newname + ".idx")
+
     def create_pack_links(self, refname, refdir, dstdir):
         dstpkdir = os.path.join(dstdir, 'objects', 'pack')
         refpkdir = os.path.join(refdir, 'objects', 'pack')
@@ -441,6 +456,7 @@ class Git(FetchMethod):
 
             if ud.static:
                 runfetchcmd("%s config --bool --add bitbake.static 1" % 
ud.basecmd, d, workdir=ud.clonedir)
+                self.rename_packs(ud, d, ud.clonedir)
 
             if ud.packref:
                 if not self.repo_is_static(ud, d, refdir):
-- 
2.25.1

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#9644): 
https://lists.yoctoproject.org/g/linux-yocto/message/9644
Mute This Topic: https://lists.yoctoproject.org/mt/81808170/21656
Group Owner: linux-yocto+ow...@lists.yoctoproject.org
Unsubscribe: https://lists.yoctoproject.org/g/linux-yocto/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to