On Wed, 2002-04-10 at 02:28, Anthony Towns wrote: > I think you'll find you're also unfairly weighting this against people > who do daily updates. If you do an update once a month, it's not as much > of a bother waiting a while to download the Packages files -- you're > going to have to wait _much_ longer to download the packages themselves. > > I'd suggest your formula would be better off being: > > bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x ) > > (If you update every day for a month, your cost isn't just one download, > it's 30 downloads. If you update once a week for a month, your cost > isn't that of a single download, it's four times that. The /x takes that > into account)
I think it depends on what you're measuring. I can think of two ways to measure the "goodness" of these schemes (there are certainly others): 1. What is the average bandwidth required at the server? 2. What is the average bandwidth required at the client? The two questions are related: If users update after i days with prob1(i), then the probability that a connection arriving at a server is from a user updating after i days is prob2(i)=(prob1(i)/i)*norm, where norm is a normalization factor so the probabilities sum to 1. I've been looking at question 2, and you're suggesting that I look at question 1, except you forgot the normalization factor. I think this is what you mean. Please correct me if I've misunderstood. Anyway, here are the results you asked for. I'm NOT including the normalization factor for easier comparison with your numbers. My diff numbers are a little different from yours mainly because I charge 1K of overhead for each file request. Diff scheme days dspace ebwidth ------------------------------- 1 12.000K 342.00K 2 24.000K 171.20K 3 36.000K 95.900K 4 48.000K 58.500K 5 60.000K 38.800K 6 72.000K 27.900K 7 84.000K 21.800K 8 96.000K 18.200K 9 108.00K 16.100K 10 120.00K 14.900K 11 132.00K 14.100K 12 144.00K 13.700K 13 156.00K 13.400K 14 168.00K 13.300K 15 180.00K 13.100K Checksum file scheme with 4 byte checksums: bsize dspace ebwidth ------------------------------- 20 312.50K 173.70K 40 156.30K 89.300K 60 104.20K 62.200K 80 78.100K 49.300K 100 62.500K 42.200K 120 52.100K 37.900K 140 44.600K 35.300K 160 39.100K 33.600K 180 34.700K 32.700K 200 31.300K 32.200K 220 28.400K 32.100K 240 26.000K 32.200K 260 24.000K 32.500K 280 22.300K 33.000K 300 20.800K 33.600K 320 19.500K 34.300K 340 18.400K 35.100K 360 17.400K 35.900K 380 16.400K 36.800K 400 15.600K 37.700K I'm probably underestimating the bandwidth of the checksum file scheme. I'm pretty confident about the diff scheme estimates, though. I think the performance of the two schemes is pretty close. Even though this looks pretty good for the checksum file scheme, I'm still partial to the diff scheme because - The checksum file scheme bottoms out at 32K, but the diff scheme can reduce transfers to 13K (using more disk space). - I trust my estimates of the diff scheme more. The rsync scheme will definitely take more bandwidth than my estimates predict. - As debian gets larger, the checksum files will get larger, and so the bandwidth will get larger. So over time, any advantage of the checksum file scheme will disappear. - The diff scheme is more flexible and easier to tune. The checksum file scheme has a "sweet spot" at 220 byt blocks. Predicting the actual value of this sweet spot may be hard in the real world. Best, Rob -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]