This is an advanced proposal for integrating deltas into the archive. It has been designed after some feedback and some more research.
Criteria ======== * We don't want to have Deltas indexes like Packages files -> we are unlikely to need most of the deltas -> about same size as Packages file (when both compressed) * We don't want to try to fetch deltas for each upgrade. We'll only be generating deltas for about 2/3rd of all upgrades (as far as current results tell us. * We don't want to have untrusted deltas/files * We don't want to generate/verify too many signatures Terminology =========== * upgrade: A tuple of two .deb; an old one, and a new one. * delta: A deb containing only ddelta files or similar instead of full files * bloom filter: A bitarray of $n$ bytes indexed by $k$ hash functions. Has no false negatives, but might have false positives. Layout ====== This approach tries to solve the problem by introducing per-source-package Delta indices in the pool. This solves the problem of having to download meta info about all deltas in the archive. * pool/$component/$prefix/$sourcename/Deltas An inline signed index containing, for each delta for that source package, the following information: * Package - the name of the package * Old-ID - the id of the package being upgraded from * New-ID - the id of the version being upgraded to * Size - the size of the delta * SHA256 - the hash of the delta * Algorithm - algorithm used in deltas, currently ddelta Including the algorithm is essential, so we can switch algorithms at a later point and have old dpkg versions download a delta they understand or no delta. Deltas are stored with a fixed pathname to avoid duplicating information: pool/$component/$prefix/$sourcename/$binaryname_$oldid_$newid_$algorith.deltadeb The concrete layout of the Deltas file is unspecified so far. A minimal solution is: SHA256: 8a09ba1ca92fe5405de7bd90fb06a1ce6cc502917df88f2990d3701a6cc75fa1 590012 apt_1e9bfd18a899677e70fda42f5192fd169c4a778d7c7426016d1bf0a0d9b5dbc7_1955901c3d7dd72c451274fa72f42539e573a9e883c4e05ad283e2a30a62f13b_ddelta.deltadeb this encodes all the information in a fairly minimal way (thus we don't need any compression), and is compatible with the existing Release file format. Note that we will know the name(s) of the file's we want to download beforehand, we just need to figure out if they exist and get signed hashes and sizes for them. The package name is in fact optional, but it might be worth including it to be able to figure out which package the delta belongs to. It might be worth including the distribution in the filename, and call it e.g. Deltas.bookworm-security instead of Deltas. That said, there might not be enough releases at a time for it to be worth it. NEEDSINVESTIGATION However, if we only produce deltas for 2/3 of the upgrades, we still have 1/3 of false positives for which we will needlessly download Deltas indices. Depending on the connection's latency, this might be bad. To solve this, we can introduce a new file: * dists/$dist/$component/binary-$arch/Deltas.bloom A bloom filter of size $n$ bits. For each delta from $old_id to $new_id, construct a delta id $old_id | $new_id (the concatenation). Split up the delta_id into chunks of $b$ bits. Use each $chunk \mod n$ as an index into the bloom filter. It seems reasonable to expect us to achieve a bloom filter with a false positive rate of about 1%. Experimental results for xenial-updates/main/amd64 have shown a 0.75% false positive rate for a bloom filter of $n=98317$ bits and $b=32$ bit chunks for 9132 accepted out of 13500 total deltas (that's about 12 KB for a Packages.xz of 800 KB). This does not include the delta algorithm, hence it's probably best not to have deltas with different algorithms for the same upgrade. The bloom filter is signed by the InRelease file. Delta upgrade process ===================== update ------ Acquire Deltas.bloom for each Packages file upgrade/install - download -------------------------- for each upgrade in changes: if upgrade in Deltas.bloom and applicable(upgrade): queue upgrade's Deltas index else: queue full deb for each deltas index: if downloaded and delta in index: queue delta else: queue deb for each delta: if downloaded failed: queue deb where applicable(upgrade): """Verify that an upgrade is applicable. For an upgrade to be applicable, the files on the file system must match the files shipped in the package. A reasonable approximation is to ensure that the file has not been modified later than the install time and that the size is the same.""" for each file in installed_version: if file.mtime >= installed_version.install_time or file.mtime > file.ctime or file.size != expected_size: return False return True Other decisions =============== * ddelta should grow a CRC for data blocks it copies/diffs so it does not silently succeed given a broken "old" file More TODO ========= Gotta update the wiki page with this stuff. -- debian developer - deb.li/jak | jak-linux.org - free software dev ubuntu core developer i speak de, en