On Wed, Apr 10, 2002 at 01:26:17AM -0700, Robert Tiberius Johnson wrote: > - I tend to update every day. For people who update every day, the > diff-based scheme only needs to transfer about 8K, but the > checksum-based scheme needs to transfer 45K. So for me, diffs are > better. :)
I think you'll find you're also unfairly weighting this against people who do daily updates. If you do an update once a month, it's not as much of a bother waiting a while to download the Packages files -- you're going to have to wait _much_ longer to download the packages themselves. I'd suggest your formula would be better off being: bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x ) (If you update every day for a month, your cost isn't just one download, it's 30 downloads. If you update once a week for a month, your cost isn't that of a single download, it's four times that. The /x takes that into account) Bandwidth cost, then is something like "the average amount downloaded by a testing/unstable user per day to update main". My results, are something like: 0 days of diffs: 843.7 KiB (the current situation) 1 day of diffs: 335.7 KiB 2 days of diffs: 167.7 KiB 3 days of diffs: 93.7 KiB 4 days of diffs: 56.9 KiB 5 days of diffs: 37.5 KiB 6 days of diffs: 26.8 KiB 7 days of diffs: 20.7 KiB 8 days of diffs: 17.2 KiB 9 days of diffs: 15.1 KiB 10 days of diffs: 13.9 KiB 11 days of diffs: 13.2 KiB 12 days of diffs: 12.7 KiB 13 days of diffs: 12.4 KiB 14 days of diffs: 12.3 KiB 15 days of diffs: 12.2 KiB ...which pretty much matches what I'd expect: at the moment, just to update main, people download around 1.2MB per day; if we let them just download the diff against yesterday, the average would plunge to only a couple of hundred k, and you rapidly reach the point of diminishing returns. I used figures of 1.5MiB for the standard gzipped Packages file you download if you can't use diffs, and 12KiB for the size of each daily diff -- if you're three days out of date, you download three diffs and apply them in order to get up to date. 12KiB is the average size of daily bzip2'ed --ed diffs over last month for sid/main/i386. The script I used for the above was (roughly): #!/usr/bin/python def cost_diff(day, ndiffs): if day <= ndiffs: return 12 * 1024 * day else: return 1.5 * 1024 * 1024 def prob(d): return (2.0 / 3.0) ** d / 2.0 def summate(f,p): cost = 0.0 for d in range(1,31): cost += f(d) * p(d) / d return cost for x in range(0,16): print "%s day/s of diffs: %.1f KiB" % \ (x, summate(lambda y: cost_diff(y,x), prob) / 1024) I'd be interested in seeing what the rsync stats look like with the "/ days" factor added in. Cheers, aj -- Anthony Towns <[EMAIL PROTECTED]> <http://azure.humbug.org.au/~aj/> I don't speak for anyone save myself. GPG signed mail preferred. ``BAM! Science triumphs again!'' -- http://www.angryflower.com/vegeta.gif
pgpj6VLhgv0uw.pgp
Description: PGP signature