The "packref" support uses symlinks to "piggy-back" use of packs into the pack namespace of the consumer from the originator (reference repo).
We use symlinks, so that mirror tarballs won't duplicate the referenced data, as hardlinks would do. However, that means we also assume that the pack namespace of the originator remains constant. What if one uses a reference from a mirror tarball, then disables mirrors, does a "cleanall" on the reference for some obscure reason, and then re-clones it from a server directly. Will the pack links from the consumer into the reference still point to a pack or be left dangling? For simplicity, we temporarily ignored this assumption so that we could deal with it (and this corner case related to it) separately - which we do now. To proceed, we first need to know a bit about pack names. Most importantly, as you can see, pack names don't really matter at all: paul@hackbox:~$ git clone --bare git://git.yoctoproject.org/poky Cloning into bare repository 'poky.git'... [...] Resolving deltas: 100% (381097/381097), done. paul@hackbox:~$ cd poky.git/objects/pack/ paul@hackbox:~/poky.git/objects/pack$ ls -l total 191100 -r--r--r-- 1 paul paul 14250608 Mar 28 12:09 pack-db459c1574c5410a1002d8098745daee3a59599f.idx -r--r--r-- 1 paul paul 181429684 Mar 28 12:09 pack-db459c1574c5410a1002d8098745daee3a59599f.pack paul@hackbox:~/poky.git/objects/pack$ mv pack-db459c1574c5410a1002d8098745daee3a59599f.idx woot-fred-was-here-31337.idx paul@hackbox:~/poky.git/objects/pack$ mv pack-db459c1574c5410a1002d8098745daee3a59599f.pack woot-fred-was-here-31337.pack paul@hackbox:~/poky.git/objects/pack$ cd ../../ paul@hackbox:~/poky.git$ git fsck Checking object directories: 100% (256/256), done. Checking objects: 100% (508912/508912), done. Checking connectivity: 508912, done. paul@hackbox:~/poky.git$ git show HEAD | head -n3 commit a7e1bbaf6d7c5d1cf44069419860dabd78c02eec Author: Khem Raj <raj.k...@gmail.com> Date: Thu Mar 25 18:11:07 2021 -0700 paul@hackbox:~/poky.git$ echo $? 0 This is because normal flow will have a clone/fetch feed stdin of index-pack, and from the "--stdin" part of its manpage, we have: If <pack-file> is not specified, the pack is written to objects/pack/ directory of the current Git repository with a default name determined from the pack content. Two things to note: 1) we have always been free to choose the pack name even if the average user probably always takes the default, and 2) the man page is intentionally kind of vague on how that default is created. We need to look at #2 in detail, in order to answer our question about whether the pack names remain constant for the same object content. The executive summary is "yes" for git v1.7 and older and "no" for git v1.8+ So we need to look at both in order to decide what to do. For the "old" git, the default pack name was simply "pack-" + the SHA1 of the sorted SHA1 object names themselves. If you have an old pack around you can confirm as I have done below: $ ls -l total 792 -r--r--r-- 1 paul paul 1632 Feb 13 2013 pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.idx -r--r--r-- 1 paul paul 803171 Feb 13 2013 pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.pack $ git show-index < pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.idx | cut -d " " -f 2 | xxd -r -p | sha1sum a9f793e9f46c138a27abb326ac68cb6e6397d0f0 - $ git verify-pack -v pack-a9f793e9f46c138a27abb326ac68cb6e6397d0f0.pack|grep '^[0-9a-f]\{40\}' | cut -d " " -f1 | sort | xxd -r -p | sha1sum a9f793e9f46c138a27abb326ac68cb6e6397d0f0 - In both cases, we recover the pack name directly from the objects. There is no extraneous information, like date, servername, etc. used in the pack name. And so it is invariant upon reclone/repack, even if the compression level was changed - since the final result from "git unpack-objects" is the same. For newer git, which contain 1190a1ac[1] the choice was made to use the SHA1 of the data as stored in the pack itself - the "trailer" SHA1. If we look at our since-renamed pack friend Fred above: paul@hackbox:~/poky.git/objects/pack$ hexdump -C woot-fred-was-here-31337.pack | tail -n3 0ad065a0 db 45 9c 15 74 c5 41 0a 10 02 d8 09 87 45 da ee 0ad065b0 3a 59 59 9f ...we can see we really didn't "lose" his original pack name, as the trailer SHA1 db459c1574c5410a1002d8098745daee3a59599f is right there. But this means that the pack name is based on transient data like compression artifacts, and that no two clones will ever likely generate the same pack name. If this was a letter in an envelope, the SHA1 went from being the contents of the letter, to also include how many times the paper was folded, and the size and colour of the envelope. This change was made to remove any possible confusion that pack name equivalence meant binary byte equivalence across two packs, by virtually ensuring we never got the same pack name twice. However, in our case, a pack name that is essentially a random number is unhelpful, and the namespace collision concerns of commit 1190a1ac are even less of a concern for static content repositories. So our static repos use pre v1.8 pack names, which reflect functional equivalence between packs from clone to clone to mirror tarball snapshot of a clone, because as demonstrated earlier, we are free to choose the pack name(s) completely at will, so long as we keep it paired with its index file. This turns out to be much more robust than trying to preserve the now random "new" pack names through self-updating symlinks based on dependencies or anon python or anything similarly complex. As a final optimization/convenience, we can simply drop the "xxd" from the pipeline - and avoid introducing new native dependencies, or adding it to the ASSUME_PROVIDED command list and hoping it is everywhere. We can do this because the ASCII representation of the .idx hex values contains the exact same pack object content information as the xxd binary-converted version, but just in a less dense format. So the SHA1 of that less packed info stream still gives us a unique signature of the objects contained in the pack - just not the exact one the older git did. [1] https://github.com/git/git/commit/1190a1ac Signed-off-by: Paul Gortmaker <paul.gortma...@windriver.com> --- bitbake/lib/bb/fetch2/git.py | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py index 8aec12df2bf8..2d693083cac4 100644 --- a/bitbake/lib/bb/fetch2/git.py +++ b/bitbake/lib/bb/fetch2/git.py @@ -386,6 +386,21 @@ class Git(FetchMethod): static = self.get_git_config(ud, d, repo, "bitbake.static") return (static == "true") + def rename_packs(self, ud, d, repo): + # Use git pack naming similar to pre v1.8 - where the name only + # depends on the objects within, not objs + compression artifacts. + pkdir = os.path.join(repo, 'objects', 'pack') + + for idx in fnmatch.filter(os.listdir(pkdir), "*.idx"): + pk = os.path.join(pkdir, idx[:-3] + "pack") + idx = os.path.join(pkdir, idx) + # insert "xxd -p -r" before the sha1sum to get exactly v1.7 names. + cmd = "%s show-index < %s | cut -d' ' -f2 | sort | sha1sum" % (ud.basecmd, idx) + output = runfetchcmd(cmd, d, workdir=repo) + newname = os.path.join(pkdir, "pack-" + output[:40]) + os.rename(pk, newname + ".pack") + os.rename(idx, newname + ".idx") + def create_pack_links(self, refname, refdir, dstdir): dstpkdir = os.path.join(dstdir, 'objects', 'pack') refpkdir = os.path.join(refdir, 'objects', 'pack') @@ -441,6 +456,7 @@ class Git(FetchMethod): if ud.static: runfetchcmd("%s config --bool --add bitbake.static 1" % ud.basecmd, d, workdir=ud.clonedir) + self.rename_packs(ud, d, ud.clonedir) if ud.packref: if not self.repo_is_static(ud, d, refdir): -- 2.25.1
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#9644): https://lists.yoctoproject.org/g/linux-yocto/message/9644 Mute This Topic: https://lists.yoctoproject.org/mt/81808170/21656 Group Owner: linux-yocto+ow...@lists.yoctoproject.org Unsubscribe: https://lists.yoctoproject.org/g/linux-yocto/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-