Re: Debian's problems, Debian's future
On Wed, 2002-04-10 at 02:28, Anthony Towns wrote: I think you'll find you're also unfairly weighting this against people who do daily updates. If you do an update once a month, it's not as much of a bother waiting a while to download the Packages files -- you're going to have to wait _much_ longer to download the packages themselves. I'd suggest your formula would be better off being: bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x ) (If you update every day for a month, your cost isn't just one download, it's 30 downloads. If you update once a week for a month, your cost isn't that of a single download, it's four times that. The /x takes that into account) I think it depends on what you're measuring. I can think of two ways to measure the goodness of these schemes (there are certainly others): 1. What is the average bandwidth required at the server? 2. What is the average bandwidth required at the client? The two questions are related: If users update after i days with prob1(i), then the probability that a connection arriving at a server is from a user updating after i days is prob2(i)=(prob1(i)/i)*norm, where norm is a normalization factor so the probabilities sum to 1. I've been looking at question 2, and you're suggesting that I look at question 1, except you forgot the normalization factor. I think this is what you mean. Please correct me if I've misunderstood. Anyway, here are the results you asked for. I'm NOT including the normalization factor for easier comparison with your numbers. My diff numbers are a little different from yours mainly because I charge 1K of overhead for each file request. Diff scheme daysdspace ebwidth --- 1 12.000K 342.00K 2 24.000K 171.20K 3 36.000K 95.900K 4 48.000K 58.500K 5 60.000K 38.800K 6 72.000K 27.900K 7 84.000K 21.800K 8 96.000K 18.200K 9 108.00K 16.100K 10 120.00K 14.900K 11 132.00K 14.100K 12 144.00K 13.700K 13 156.00K 13.400K 14 168.00K 13.300K 15 180.00K 13.100K Checksum file scheme with 4 byte checksums: bsize dspace ebwidth --- 20 312.50K 173.70K 40 156.30K 89.300K 60 104.20K 62.200K 80 78.100K 49.300K 100 62.500K 42.200K 120 52.100K 37.900K 140 44.600K 35.300K 160 39.100K 33.600K 180 34.700K 32.700K 200 31.300K 32.200K 220 28.400K 32.100K 240 26.000K 32.200K 260 24.000K 32.500K 280 22.300K 33.000K 300 20.800K 33.600K 320 19.500K 34.300K 340 18.400K 35.100K 360 17.400K 35.900K 380 16.400K 36.800K 400 15.600K 37.700K I'm probably underestimating the bandwidth of the checksum file scheme. I'm pretty confident about the diff scheme estimates, though. I think the performance of the two schemes is pretty close. Even though this looks pretty good for the checksum file scheme, I'm still partial to the diff scheme because - The checksum file scheme bottoms out at 32K, but the diff scheme can reduce transfers to 13K (using more disk space). - I trust my estimates of the diff scheme more. The rsync scheme will definitely take more bandwidth than my estimates predict. - As debian gets larger, the checksum files will get larger, and so the bandwidth will get larger. So over time, any advantage of the checksum file scheme will disappear. - The diff scheme is more flexible and easier to tune. The checksum file scheme has a sweet spot at 220 byt blocks. Predicting the actual value of this sweet spot may be hard in the real world. Best, Rob -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Wed, 2002-04-10 at 04:35, Michael Bramer wrote: Scheme Disk space Bandwidth --- Checksums (bwidth optimal)26K 81K diffs (4 days)32K 331K diffs (9 days)71K 66K diffs (20 days) 159K 27K can you explain your counts? Sure. At the end of this message is a script that you can use with the program gp to recreate my numbers. Here's a quick description: Anthony Towns said that the average size of a diff between two Packages files is 12K (after compression with bzip2). So if the server keeps d days of diffs, this will take about d*12K of disk space. If I go for i days without updating, then when I do update, if i = d, then I will need to download i diff files, using about i*(12K + 1K) bandwidth. (The 1K is for each GET request, since I'm downloading i files). If i d, then I need to get the whole Packages.gz file, which I estimate as about 1.6M. So let bwidth(d,i) = the amount of bandwidth used doing an update after i days, where the server has kept d days of diffs. So how much bandwidth is used _on average_? Well, it depends on how often everybody updates. If everybody updates everyday, then everybody would just need to download 1 diff, using 13K. If everybody updates every week, then the average bandwidth is 7*13K=91K. In reality, we don't know how often people update, but my guess is that people tend to update often. So I just guessed that the probability that someone updates after i days is prob(i)=((2/3)^i)/2. Why this formula? It seemed good at the time. So then the average bandwidth used is average_bwidth(d)=sum i=1,...,infinity of bwidth(d,i) * prob (i) That's it for the diff stuff. For the checksum scheme, the disk space required is the number of checksums times the size of each checksum. The number of checksums is the size of Packages.gz divided by the block size. Since the checksum file has to be transferred to every client, the size of the checksum file contributes to the bandwidth estimate, as well. Additionally, I estimate that 75 packages change in debian every day (derived by looking at debian-devel-changes in Feb. and March). Using a little probability, I computed the average number of blocks in Packages.gz that will change in i days. I then estimate that each of these blocks will have to be transferred during an update, and use that to estimate the amount of bandwidth required for an update. Then I average, as with the diff scheme. Let me know if you think there's any problems with this.. I've been playing around recently with a more realistic (in my opinion) user model. I now predict that the probability that a user will update every i days is n/(i+1)^3 (n is a normalization factor). I like this model because it predicts that if a user hasn't updated in a long time, it'll probably be a long time before they update. This seems intuitively correct to me. So here's some new numbers comparing the diff scheme and the rsync scheme in this new user model. In my opinion, diff still wins. These numbers use prob(i)=(n/(i+1)^3)/i, so these numbers are the average bandwidth, averaged over what the server sees. For an explanation of this, see http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg01076.html. Diffs: daysdspace ebwidth --- 1 12.000K 296.80K 2 24.000K 110.70K 3 36.000K 58.800K 4 48.000K 39.100K 5 60.000K 30.000K 6 72.000K 25.300K 7 84.000K 22.600K 8 96.000K 21.000K 9 108.00K 19.900K 10 120.00K 19.200K 11 132.00K 18.700K 12 144.00K 18.400K 13 156.00K 18.100K 14 168.00K 17.900K 15 180.00K 17.800K Checksum files: bsize dspace ebwidth --- 20 312.50K 315.40K 40 156.30K 161.10K 60 104.20K 111.00K 80 78.100K 86.800K 100 62.500K 73.100K 120 52.100K 64.600K 140 44.600K 59.100K 160 39.100K 55.500K 180 34.700K 53.000K 200 31.300K 51.500K 220 28.400K 50.500K 240 26.000K 50.100K 260 24.000K 50.000K 280 22.300K 50.100K 300 20.800K 50.500K 320 19.500K 51.100K 340 18.400K 51.800K 360 17.400K 52.700K 380 16.400K 53.700K 400 15.600K 54.700K Best, Rob /*** * Info about debian ***/ /* The number of packages in debian */ npkgs=8000.0 /* How big is Packages[.gz] as a function of the number of packages */ compressed_bytes_per_pkg=200.0 uncompressed_bytes_per_pkg=800.0
Re: Debian's problems, Debian's future
On Wed, 2002-04-10 at 09:46, Erich Schubert wrote: What diff options do you use? As the diffs are expected to be applied to the correct version, they probably shouldn't contain the old data, but the new data only. Good point. I used diff -ed, so I think this is not including unnecessary context info, as you suggest. Best, Rob -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Thu, Apr 11, 2002 at 10:40:31PM -0700, Robert Tiberius Johnson wrote: On Wed, 2002-04-10 at 02:28, Anthony Towns wrote: I'd suggest your formula would be better off being: bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x ) I think it depends on what you're measuring. I can think of two ways to measure the goodness of these schemes (there are certainly others): 1. What is the average bandwidth required at the server? 2. What is the average bandwidth required at the client? I don't think the bandwidth at the server is a major issue to anyone, although obviously improvements there are a Good Thing. Personally, I think amount of time spent waiting for apt-get update to finish is the important measure (well, apt-get update; apt-get dist-upgrade is important too, but I don't thing we've seen any feasible ideas at improving the latter). prob2(i)=(prob1(i)/i)*norm, where norm is a normalization factor so the probabilities sum to 1. I've been looking at question 2, and you're suggesting that I look at question 1, except you forgot the normalization factor. I think this is what you mean. Please correct me if I've misunderstood. No, I'm not. I'm saying that the amount of time spent waiting for apt-get update needs to count every apt-get update you run, not just the first. So, if over a period of a week, I run it seven times, and you run it once, I wait seven times as long as you do, so it's seven times more important to speed things up for me, than for you. Anyway, here are the results you asked for. I'm NOT including the normalization factor for easier comparison with your numbers. My diff numbers are a little different from yours mainly because I charge 1K of overhead for each file request. Merging, and reordering by decreasing estimated bandwidth. The ones marked with *'s aren't worth considering because there's a method that's both has less bandwidth required, and takes up less diskspace. The ones without stars are thus ordered by increasing diskspace, and decreasing bandwidth. days/ bsize dspace ebwidth --- Having the ebwidth of the current situation (everyone downloads the entire Packages file) for comparison would be helpful. 1 12.000K 342.00K [diff] 20312.50K * 173.70K [cksum/rsync] 2 24.000K * 171.20K [diff] 3 36.000K * 95.900K [diff] 40156.30K * 89.300K [cksum/rsync] 60104.20K * 62.200K [cksum/rsync] 4 48.000K * 58.500K [diff] 8078.100K * 49.300K [cksum/rsync] 100 62.500K * 42.200K [cksum/rsync] 5 60.000K * 38.800K [diff] 120 52.100K * 37.900K [cksum/rsync] 400 15.600K 37.700K [cksum/rsync] 380 16.400K 36.800K [cksum/rsync] 360 17.400K 35.900K [cksum/rsync] 140 44.600K * 35.300K [cksum/rsync] 340 18.400K 35.100K [cksum/rsync] 320 19.500K 34.300K [cksum/rsync] 300 20.800K * 33.600K [cksum/rsync] 160 39.100K * 33.600K [cksum/rsync] 280 22.300K 33.000K [cksum/rsync] 180 34.700K * 32.700K [cksum/rsync] 260 24.000K 32.500K [cksum/rsync] 240 26.000K 32.200K [cksum/rsync] 200 31.300K * 32.200K [cksum/rsync] 220 28.400K 32.100K [cksum/rsync] 6 72.000K 27.900K [diff] 7 84.000K 21.800K [diff] 8 96.000K 18.200K [diff] 9 108.00K 16.100K [diff] 10120.00K 14.900K [diff] 11132.00K 14.100K [diff] 12144.00K 13.700K [diff] 13156.00K 13.400K [diff] 14168.00K 13.300K [diff] 15180.00K 13.100K [diff] 180k is roughly 10% of the size of the corresponding Packages.gz, so is relatively trivial. Since we'll probably do it at the same time as dropping the uncompressed Packages file (sid/main/i386 alone is 6MB), this is pretty neglible. Cheers, aj -- Anthony Towns [EMAIL PROTECTED] http://azure.humbug.org.au/~aj/ I don't speak for anyone save myself. GPG signed mail preferred. ``BAM! Science triumphs again!'' -- http://www.angryflower.com/vegeta.gif pgp6DeEYsec6i.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Fri, 2002-04-12 at 00:14, Anthony Towns wrote: No, I'm not. I'm saying that the amount of time spent waiting for apt-get update needs to count every apt-get update you run, not just the first. So, if over a period of a week, I run it seven times, and you run it once, I wait seven times as long as you do, so it's seven times more important to speed things up for me, than for you. Got it. Thanks for clearing that up. Having the ebwidth of the current situation (everyone downloads the entire Packages file) for comparison would be helpful. Your right. Here it is: old_ebwidth = 879K. Best, Rob -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote: On Tue, Apr 09, 2002 at 05:02:34PM +0200, Michael Bramer wrote: you propose to add 'some' diff files for all files on ftp-master.d.o? With rsync we need only one rsync-checksum file per normal file and all apt's need only download the neededs parts. You get the point? With the standard rsync algorithm, the rsync checksum files would actually be 8 times larger than the original file (you need to store the checksum for each possible block in the file). I don't see that the checksum file is larger than the origanl file. If the checksum file is larger, we will have more bytes to download... This was not the goal. What you are suggesting is that the server store checksums for precalculated blocks on the server. This would be 4 bytes per 1k in the original file or so. The transaction proceeds as follows: 1. Client asks for checksum list off server 2. Client calculates checksums for local file 3. Client compares list of server with list of client 4. Client downloads changed regions. Yes, this is the way.. Note, this is not the rsync algorithm, but the one that is possibly patented. maybe I don't understand the rsync algorithm... IMHO the rsync algorithm is: 1.) Computer beta splits file B in blocks. 2.) calculate two checksums a.) weak ``rolling'' 32-bit checksum b.) md5sum 3.) Computer B send this to computer A. 4.) Computer A search in file A for parts with the same checksums from file B 5.) Computer A request unmatch blocks from computer B and build the file B. I get this from /usr/share/doc/rsync/tech_report.tex.gz right? The _only_ difference is: precalculate the checksums on computer B Or maybe store the calculated checksums in a /var/cache/rsync/ cache dir. sorry, I know that partentes don't have any logic, but this is the same algorithm, only with some cache. Comments? Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Nicht geschehene Taten ziehen oft einen erstaunlichen Mangel an Folgen nach sich. -- S.J. Lec pgpQY2Jd0eOPS.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, 2002-04-09 at 17:25, Martijn van Oosterhout wrote: What you are suggesting is that the server store checksums for precalculated blocks on the server. This would be 4 bytes per 1k in the original file or so. The transaction proceeds as follows: 1. Client asks for checksum list off server 2. Client calculates checksums for local file 3. Client compares list of server with list of client 4. Client downloads changed regions. Note, this is not the rsync algorithm, but the one that is possibly patented. This looks like an interesting algorithm, so I decided to compare it to the diff scheme analyzed in http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg00502.html The above message also gives my analysis methodology. The results: - The following table summarizes the performance of the checksum-based scheme and the diff-based scheme under the assumption that users tend to perform apt-get update often. I think disk space is cheap and bandwidth is expensive, so 20 days of diffs is the best choice. Scheme Disk space Bandwidth --- Checksums (bwidth optimal)26K 81K diffs (4 days)32K 331K diffs (9 days)71K 66K diffs (20 days) 159K 27K - The analysis is unfairly favorable to the checksum scheme, because I do not count the bandwidth required to request all the changed blocks, only the bandwidth used to transmit the changed blocks. - For the user model in the message above, the optimal block size for this algorithm is around 245 bytes . - In the diff-based scheme, each mirror can decide on a diskspace/bandwidth tradeoff by simply keeping more old diffs or deleting some old diffs. The checksum-based scheme doesn't really support tweaking at the mirror. - I tend to update every day. For people who update every day, the diff-based scheme only needs to transfer about 8K, but the checksum-based scheme needs to transfer 45K. So for me, diffs are better. :) Best, Rob -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Wed, Apr 10, 2002 at 01:26:17AM -0700, Robert Tiberius Johnson wrote: - I tend to update every day. For people who update every day, the diff-based scheme only needs to transfer about 8K, but the checksum-based scheme needs to transfer 45K. So for me, diffs are better. :) I think you'll find you're also unfairly weighting this against people who do daily updates. If you do an update once a month, it's not as much of a bother waiting a while to download the Packages files -- you're going to have to wait _much_ longer to download the packages themselves. I'd suggest your formula would be better off being: bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x ) (If you update every day for a month, your cost isn't just one download, it's 30 downloads. If you update once a week for a month, your cost isn't that of a single download, it's four times that. The /x takes that into account) Bandwidth cost, then is something like the average amount downloaded by a testing/unstable user per day to update main. My results, are something like: 0 days of diffs: 843.7 KiB (the current situation) 1 day of diffs: 335.7 KiB 2 days of diffs: 167.7 KiB 3 days of diffs: 93.7 KiB 4 days of diffs: 56.9 KiB 5 days of diffs: 37.5 KiB 6 days of diffs: 26.8 KiB 7 days of diffs: 20.7 KiB 8 days of diffs: 17.2 KiB 9 days of diffs: 15.1 KiB 10 days of diffs: 13.9 KiB 11 days of diffs: 13.2 KiB 12 days of diffs: 12.7 KiB 13 days of diffs: 12.4 KiB 14 days of diffs: 12.3 KiB 15 days of diffs: 12.2 KiB ...which pretty much matches what I'd expect: at the moment, just to update main, people download around 1.2MB per day; if we let them just download the diff against yesterday, the average would plunge to only a couple of hundred k, and you rapidly reach the point of diminishing returns. I used figures of 1.5MiB for the standard gzipped Packages file you download if you can't use diffs, and 12KiB for the size of each daily diff -- if you're three days out of date, you download three diffs and apply them in order to get up to date. 12KiB is the average size of daily bzip2'ed --ed diffs over last month for sid/main/i386. The script I used for the above was (roughly): #!/usr/bin/python def cost_diff(day, ndiffs): if day = ndiffs: return 12 * 1024 * day else: return 1.5 * 1024 * 1024 def prob(d): return (2.0 / 3.0) ** d / 2.0 def summate(f,p): cost = 0.0 for d in range(1,31): cost += f(d) * p(d) / d return cost for x in range(0,16): print %s day/s of diffs: %.1f KiB % \ (x, summate(lambda y: cost_diff(y,x), prob) / 1024) I'd be interested in seeing what the rsync stats look like with the / days factor added in. Cheers, aj -- Anthony Towns [EMAIL PROTECTED] http://azure.humbug.org.au/~aj/ I don't speak for anyone save myself. GPG signed mail preferred. ``BAM! Science triumphs again!'' -- http://www.angryflower.com/vegeta.gif pgpj6VLhgv0uw.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Wed, Apr 10, 2002 at 07:28:42PM +1000, Anthony Towns wrote: 0 days of diffs: 843.7 KiB (the current situation) ...which pretty much matches what I'd expect: at the moment, just to update main, people download around 1.2MB per day; Uh, obviously this should be 843KiB. (I'd been playing with other probabilities when I was writing the latter part. Tsktsk.) Cheers, aj -- Anthony Towns [EMAIL PROTECTED] http://azure.humbug.org.au/~aj/ I don't speak for anyone save myself. GPG signed mail preferred. ``BAM! Science triumphs again!'' -- http://www.angryflower.com/vegeta.gif -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote: On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote: I beleive this method is patented by somebody, [snip] has someone a pointer? Here's some stuff from my mail archives - I haven't checked whether the links still work. The following one probably doesn't, but looks like the patent number is 6167407: http://164.195.100.11/netacgi/nph-Parser?Sect1=PTO1Sect2=HITOFFd=PALLp=1u=/netahtml/srchnum.htmr=1f=Gl=50s1='6167407'.WKU.OS=PN/6167407RS=PN/6167407 Cheers, Richard -- __ _ |_) /| Richard Atterer | CS student at the Technische | GnuPG key: | \/¯| http://atterer.net | Universität München, Germany | 0x888354F7 ¯ '` ¯ - Forwarded message from Clifford Heath [EMAIL PROTECTED] - Date: 29 Jan 2001 10:05:11 +1100 From: Clifford Heath [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] To: Goswin Brederlow [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: reverse checksumming [legal] What came first, rsync or the patent? OSA refers to its issued patent US6006034 as the SmartPull patent. This isn't the patent that threatens rsync, though it might be relevant to some of the preceeding discussion. I haven't considered whether there's any overlap with rsync itself - in any case I doubt OSA would attempt to block use of rsync. We think that rsync is wonderful! The patent that may overlap with rsync is US5446888 and its followup number US5721907, with precedence date Jan 14 1994. I am not a lawyer, but it seems to directly conflict with rsync. I've had correspondence with the rights holders, as OSA wished to implement something similar in a product but held off until licensing concerns were addressed. They are Travelling Software Inc. (TSI), and refer to the technology as SpeedSync, using it in their LapLink product line. At the time of my last contact (August 1999), Travelling Software had not made a determination if rsync infringes on any intellectual property right of TSI or not. Read the patent and decide for yourself. I'm not qualified to hold a legal opinion. The patent clearly identifies which operations are performed on the host sending the file, and which on the host receiving it. We discovered a method which reversed many of the operations with substantial benefit (as Tim Adam has told you), and filed a patent to this effect, with a defensive intent. This latest patent has not issued (it's pending). So we have no rights (yet!) to ask you to cease and desist from implementing and using it. Be aware that this might change in the future. I personally believe (and think OSA agrees) that it would be counter-productive to the industry as a whole, but it's not my decision. Who knows, OSA itself might be sold to some sharks who think differently... This is just the rsync algorithm and thats probably way older than the patent, so the patent might not hold. I don't believe that rsync is older, but in any case it's difficult and expensive to challenge an issued patent over prior art, and I don't think that Tridge is likely to do that. If you fear a suit from TSI and would choose a prior art defense, you will need Tridge's help, as only he could establish priority. Can the text of the Patent be found anywhere online? http://www.delphion.com/details?pn=US05721907__ -- Clifford Heath, Open Software Associates, mailto:[EMAIL PROTECTED], Ph +613 9895 2194, Fax 9895 2020, http://www.osa.com.au/~cjh, 56-60 Rutland Rd, Box Hill 3128, Melbourne, Victoria, Australia. - End forwarded message - -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Wed, Apr 10, 2002 at 09:22:50AM +0200, Michael Bramer wrote: On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote: With the standard rsync algorithm, the rsync checksum files would actually be 8 times larger than the original file (you need to store the checksum for each possible block in the file). I don't see that the checksum file is larger than the origanl file. If the checksum file is larger, we will have more bytes to download... This was not the goal. That's because the client doesn't not download the checksums. Look below. maybe I don't understand the rsync algorithm... IMHO the rsync algorithm is: 1.) Computer beta splits file B in blocks. 2.) calculate two checksums a.) weak ``rolling'' 32-bit checksum b.) md5sum 3.) Computer B send this to computer A. 4.) Computer A search in file A for parts with the same checksums from file B 5.) Computer A request unmatch blocks from computer B and build the file B. I get this from /usr/share/doc/rsync/tech_report.tex.gz Computer A wants to download a file F from computer B. 1. Computer A splits it's version into blocks, calculates the checksum for each block. 2. Computer A sends this list to computer B. This should be 1% the size of the original file. Depends on the block size. 3. Computer B takes this list and does the rolling checksum over the file. Basically, it calculates the checksum for bytes 0-1023, checks for it in the list from the client. If it's a match send back a string indicating which block it is, else send byte 0. Calculate checksum of 1-1024 and do the same. The rolling checksum is just an optimisation. 4. Computer A receives list of tokens which are either bytes of data or indications of which block to copy from the original file. Notice that: a. The server (computer B) does *all* the work. b. The data forms a stream. The client can split itself into two and can be analysing the next file while the server is still processing the current one. Your above algorithm requires two requests for each file. The streaming help performance over high latency links. c. Precalculating checksums on the client is useless d. Precalculating checksums on the server is also useless because the storage would be more (remember, checksum for bytes 0-1023, then for 1-1024, 2-1025, etc). It's faster to calculate them than to load them off disk. So, the main difference between what you are proposing is 1 versus 2 requests per file. And rsync definitly only has one. Besides, look at the other posts on this thread. Diff requires less download than rsync. -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Ignorance continues to thrive when intelligent people choose to do nothing. Speaking out against censorship and ignorance is the imperative of all intelligent people. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Wed, Apr 10, 2002 at 01:26:17AM -0700, Robert Tiberius Johnson wrote: This looks like an interesting algorithm, so I decided to compare it to the diff scheme analyzed in http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg00502.html The above message also gives my analysis methodology. The results: - The following table summarizes the performance of the checksum-based scheme and the diff-based scheme under the assumption that users tend to perform apt-get update often. I think disk space is cheap and bandwidth is expensive, so 20 days of diffs is the best choice. Scheme Disk space Bandwidth --- Checksums (bwidth optimal)26K 81K diffs (4 days)32K 331K diffs (9 days)71K 66K diffs (20 days) 159K 27K can you explain your counts? Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Nicht geschehene Taten ziehen oft einen erstaunlichen Mangel an Folgen nach sich. -- S.J. Lec pgp2Mvffax1Cu.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Wed, Apr 10, 2002 at 08:29:49PM +1000, Martijn van Oosterhout wrote: On Wed, Apr 10, 2002 at 09:22:50AM +0200, Michael Bramer wrote: On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote: With the standard rsync algorithm, the rsync checksum files would actually be 8 times larger than the original file (you need to store the checksum for each possible block in the file). I don't see that the checksum file is larger than the origanl file. If the checksum file is larger, we will have more bytes to download... This was not the goal. That's because the client doesn't not download the checksums. Look below. maybe I don't understand the rsync algorithm... IMHO the rsync algorithm is: 1.) Computer beta splits file B in blocks. 2.) calculate two checksums a.) weak ``rolling'' 32-bit checksum b.) md5sum 3.) Computer B send this to computer A. 4.) Computer A search in file A for parts with the same checksums from file B 5.) Computer A request unmatch blocks from computer B and build the file B. I get this from /usr/share/doc/rsync/tech_report.tex.gz Computer A wants to download a file F from computer B. 1. Computer A splits it's version into blocks, calculates the checksum for each block. 2. Computer A sends this list to computer B. This should be 1% the size of the original file. Depends on the block size. 3. Computer B takes this list and does the rolling checksum over the file. Basically, it calculates the checksum for bytes 0-1023, checks for it in the list from the client. If it's a match send back a string indicating which block it is, else send byte 0. Calculate checksum of 1-1024 and do the same. The rolling checksum is just an optimisation. 4. Computer A receives list of tokens which are either bytes of data or indications of which block to copy from the original file. all ok. I write the same above, except point '4' and you switch A and B... Notice that: a. The server (computer B) does *all* the work. If you use A as Server, the client make all the work. c. Precalculating checksums on the client is useless d. Precalculating checksums on the server is also useless because the storage would be more (remember, checksum for bytes 0-1023, then for 1-1024, 2-1025, etc). It's faster to calculate them than to load them off disk. Precalculating of the _block_ checksums is _not_ useless. This checksums are only 1% the size of the original file (depends on the block size). So, the main difference between what you are proposing is 1 versus 2 requests per file. And rsync definitly only has one. The main difference is: The client and not the server make all the work! Besides, look at the other posts on this thread. Diff requires less download than rsync. I read it, but I don't understand it. But this is not the problem. IMHO the diff is a kind of a hack and a cached rsync is a nice framework. But this is only my taste... Maybe I should read the rsync-source-code...Done Ok, with the normal rsync program the client make the block checksums and the server search in the file... Thanks for your help. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Hummeln koennen wirklich stechen, tun das aber nur in extremen Ausnahme- Situationen. NT tut in solchen Situationen nichts mehr. aus d.a.s.r pgphEKKQCFH3u.pgp Description: PGP signature
Re: Debian's problems, Debian's future
Scheme Disk space Bandwidth --- Checksums (bwidth optimal)26K 81K diffs (4 days)32K 331K diffs (9 days)71K 66K diffs (20 days) 159K 27K What diff options do you use? As the diffs are expected to be applied to the correct version, they probably shouldn't contain the old data, but the new data only. Greetings, Erich -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
hello we sould stop this and start after woody again... On Thu, Mar 28, 2002 at 08:17:46PM +0100, Jeroen Dekkers wrote: On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote: I'd suggest using diffs, as this brings the best results and is the http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg01303.html (I use apt-pupdate all the time now, it works for me (tm)) Sorry, diffs are simply silly! Use rsync with the uncompressed Packages file and diffs aren't necessary. Or use a packer which doesn't hinder rsync from saving (gzip --rsyncable). This isn't server friendly. no. sorry. I must say this: We can use rsync on the client site. - get a rsync-checksum file (use a fix block size) - make the check on the client site and - download the file partly per ftp/http - make the new file with the old and downloaded parts With this the server need only extra rsync-checksum files. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Wenn man sich naeher mit Linux beschaeftigt, wird man nie versuchen, WinNT das Attribut stabil aufzudruecken! ([EMAIL PROTECTED]) pgp24zK6DHCTU.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Sat, Mar 30, 2002 at 04:49:25AM +0900, Junichi Uekawa wrote: [EMAIL PROTECTED] (Otto Wyss) cum veritate scripsit: Packages.0 from 28-March is probably the newest and the smallest upgrade is problably the diff for one day (209k uncompressed, 50k gzipped). On the 28th rsync's download was 130k, today it was less than 100k. I don't know why your uncompressed diff is bigger than what rsync says. Also note that this is a one-time thing, and can be served through normal http protocol, or ftp, or whatever. rsync requires handholding from the server side. Which is unlikely to happen for every single server serving Debian mirror. no. technical you can move this all to the client and use ftp/http for the download of parts of the files.. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux How hard can it be, it's just an operating system? -- Linus Torvalds pgp2uWU9pT7sG.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Sat, Mar 30, 2002 at 02:11:00AM +0100, Wichert Akkerman wrote: - I would like to have templates with substitution fields. Already exists. Any references? How about the debconf manual? but sorry, we have some outdated translations in debconf templates files. No translator know, if someone change the english template. Please can we use gettext or something other without 'outdated translations'? Joey ? Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux it was hard to write, so it should be hard to read pgpJ83wRzeC7l.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote: I'd suggest using diffs, as this brings the best results and is the [diffs for Packages files that is] wooo!!! http://people.debian.org/~dancer/Packages-for-main-i386/ # Time for suggesting is up, please implement. Indeed, it appears it has been implemented more than once. http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg01303.html (I use apt-pupdate all the time now, it works for me (tm)) Sorry, diffs are simply silly! Use rsync with the uncompressed Packages file and diffs aren't necessary. Or use a packer which doesn't hinder rsync from saving (gzip --rsyncable). right. Now I search in the lists and found the old mails... Maybe someone like to read the mails and reply: http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg00757.html Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux it was hard to write, so it should be hard to read pgp6S2IxrFfDb.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Fri, Mar 29, 2002 at 11:16:44AM +0100, Eduard Bloch wrote: #include hallo.h Joey Hess wrote on Wed Mar 27, 2002 um 02:21:49PM: That is a rather misleading summary of the situation, which as a subscriber to debian-boot, you should understand better. Have you done any testing of the proposed base-config patch? Sure. Peter's patches are AFAIK not ready and I have a bad feeling about his dbootstrap modifications. I have a testing installation image with hacked base-config (my patches), but I was disappointed, since many debconf templates in called packages templates in the first base-config steps were not translated. It is too late to change them all, so I can only keep calling it a pity and hope that people mastering customised CD sets would contact me or Peter. can you put this files online? Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Now let me explain why this makes intuitive sense. --- Prof. Larry Wasserman pgpJ7WkUkIsXF.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote: hello we sould stop this and start after woody again... On Thu, Mar 28, 2002 at 08:17:46PM +0100, Jeroen Dekkers wrote: On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote: I'd suggest using diffs, as this brings the best results and is the http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg01303.html (I use apt-pupdate all the time now, it works for me (tm)) Sorry, diffs are simply silly! Use rsync with the uncompressed Packages file and diffs aren't necessary. Or use a packer which doesn't hinder rsync from saving (gzip --rsyncable). This isn't server friendly. no. sorry. I must say this: We can use rsync on the client site. - get a rsync-checksum file (use a fix block size) - make the check on the client site and - download the file partly per ftp/http - make the new file with the old and downloaded parts With this the server need only extra rsync-checksum files. I beleive this method is patented by somebody, which is why it's not in use/supported. Other than that, it's very nice idea. I beleive there may be some semi-implementations around somewhere. The concept is no different from normal rsync. -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Ignorance continues to thrive when intelligent people choose to do nothing. Speaking out against censorship and ignorance is the imperative of all intelligent people. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote: On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote: This isn't server friendly. no. sorry. I must say this: We can use rsync on the client site. - get a rsync-checksum file (use a fix block size) - make the check on the client site and - download the file partly per ftp/http - make the new file with the old and downloaded parts With this the server need only extra rsync-checksum files. I beleive this method is patented by somebody, which is why it's not in use/supported. Other than that, it's very nice idea. I beleive there may be some semi-implementations around somewhere. The concept is no different from normal rsync. has someone a pointer? This is rsync, only the server is the client und the client work as server... Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Da haben wir es, emacs ist eine Religon, kein Editor. Ich bin nicht bereit einer Goetze meinen Spreicher zu opfern. -- Werner Olschewski pgpMi6mUUNMnB.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote: On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote: I beleive this method is patented by somebody, which is why it's not in use/supported. Other than that, it's very nice idea. I beleive there may be some semi-implementations around somewhere. The concept is no different from normal rsync. has someone a pointer? This is rsync, only the server is the client und the client work as server... Unfortunatly no. I just remember it as a passing comment while talking with Andrew Tridgell (creator of rsync). A google search turns up oblique references at: http://rproxy.samba.org/doc/notes/server-generated-signatures.txt http://www.sharemation.com/~milele/public/rsync-specification.htm (near bottom) http://pserver.samba.org/cgi-bin/cvsweb/rproxy/doc/calu_paper/calu_paper.tex?annotate=1.1 http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html Someone on debianplanet suggests it may be a rumour. I don't know, I can't find any precise patent numbers. HTH, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Ignorance continues to thrive when intelligent people choose to do nothing. Speaking out against censorship and ignorance is the imperative of all intelligent people. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 09:53:44AM +0200, Michael Bramer wrote: On Sat, Mar 30, 2002 at 02:11:00AM +0100, Wichert Akkerman wrote: - I would like to have templates with substitution fields. Already exists. Any references? How about the debconf manual? but sorry, we have some outdated translations in debconf templates files. No translator know, if someone change the english template. Please can we use gettext or something other without 'outdated translations'? Joey ? If you are concerned that translators receive automatic notification when a source debconf template has changed, that's an infrastructure problem. Neither debconf nor gettext has automatic translator notifications built-in, and debconf's templates are not an inferior solution for not providing this. Debconf, if used correctly, does correctly handle merging of outdated translations. See debconf-mergetemplate(1). Steve Langasek postmodern programmer pgpgqEniZn6jl.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, 9 Apr 2002, Martijn van Oosterhout wrote: On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote: On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote: I beleive this method is patented by somebody, which is why it's not in use/supported. Possibly it was only patented in the non-free united companies of america. So it might well go into non-free (the inversion of the meaning comes straight out of 1984). *t Tomas Pospisek SourcePole - Linux Open Source Solutions http://sourcepole.ch Elestastrasse 18, 7310 Bad Ragaz, Switzerland Tel: +41 (81) 330 77 11 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 03:24:42PM +0200, Tomas Pospisek's Mailing Lists wrote: On Tue, 9 Apr 2002, Martijn van Oosterhout wrote: On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote: On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote: I beleive this method is patented by somebody, which is why it's not in use/supported. Possibly it was only patented in the non-free united companies of america. So it might well go into non-free (the inversion of the meaning comes straight out of 1984). Well, a lot of patents are recognised across borders. And someone could write it in a country that doesn't recognise software patents, but the DeCSS stuff showed that that's not safe either. Software patents are just plain irritating. -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Ignorance continues to thrive when intelligent people choose to do nothing. Speaking out against censorship and ignorance is the imperative of all intelligent people. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote: hello we sould stop this and start after woody again... On Thu, Mar 28, 2002 at 08:17:46PM +0100, Jeroen Dekkers wrote: On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote: Sorry, diffs are simply silly! Use rsync with the uncompressed Packages file and diffs aren't necessary. Or use a packer which doesn't hinder rsync from saving (gzip --rsyncable). This isn't server friendly. no. sorry. I must say this: We can use rsync on the client site. - get a rsync-checksum file (use a fix block size) - make the check on the client site and - download the file partly per ftp/http - make the new file with the old and downloaded parts With this the server need only extra rsync-checksum files. IMHO it's better to make just diffs instead of extra rsync-checksum files and then having to download all parts of those files. Jeroen Dekkers -- Jabber supporter - http://www.jabber.org Jabber ID: [EMAIL PROTECTED] Debian GNU supporter - http://www.debian.org http://www.gnu.org IRC: [EMAIL PROTECTED] pgpmOLJmDBDJh.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 08:02:14AM -0500, Steve Langasek wrote: On Tue, Apr 09, 2002 at 09:53:44AM +0200, Michael Bramer wrote: On Sat, Mar 30, 2002 at 02:11:00AM +0100, Wichert Akkerman wrote: - I would like to have templates with substitution fields. Already exists. Any references? How about the debconf manual? but sorry, we have some outdated translations in debconf templates files. No translator know, if someone change the english template. Please can we use gettext or something other without 'outdated translations'? Joey ? If you are concerned that translators receive automatic notification when a source debconf template has changed, that's an infrastructure problem. Neither debconf nor gettext has automatic translator notifications built-in, and debconf's templates are not an inferior solution for not providing this. I know this. And as infrastructure we can use the ddtp. I have already work on this. But in the last weeks I don't have real time and I break this sub project. Debconf, if used correctly, does correctly handle merging of outdated translations. See debconf-mergetemplate(1). ok. thanks. I don't know this. Maybe I must RTFM... Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Ja, aber der Bootvorgang ist doch so sch?n mit den Wolken und so. Das st?rt meiner Meinung nach garnicht. (Martin Heinz zum Rebooten von M$-W) pgp8vSydUMyki.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 04:34:43PM +0200, Jeroen Dekkers wrote: On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote: no. sorry. I must say this: We can use rsync on the client site. - get a rsync-checksum file (use a fix block size) - make the check on the client site and - download the file partly per ftp/http - make the new file with the old and downloaded parts With this the server need only extra rsync-checksum files. IMHO it's better to make just diffs instead of extra rsync-checksum files and then having to download all parts of those files. you propose to add 'some' diff files for all files on ftp-master.d.o? With rsync we need only one rsync-checksum file per normal file and all apt's need only download the neededs parts. You get the point? Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Like sex in high school, everyone's talking about Linux, but is anyone doing it? -- Computer Currents pgpA31TJ6OvFp.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 10:25:04PM +1000, Martijn van Oosterhout wrote: On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote: On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote: I beleive this method is patented by somebody, which is why it's not in use/supported. Other than that, it's very nice idea. I beleive there may be some semi-implementations around somewhere. The concept is no different from normal rsync. has someone a pointer? This is rsync, only the server is the client und the client work as server... Unfortunatly no. I just remember it as a passing comment while talking with Andrew Tridgell (creator of rsync). A google search turns up oblique references at: http://rproxy.samba.org/doc/notes/server-generated-signatures.txt 'The current RProxy specifications at sourceforge.net do not have the client calculating the signature. Instead, the client gets the signature from the server when it first downloads the file, and saves this signature (just like an ETag) for use when re-loading the file. This mechanism was chosen only because of possible patent problems with client calculation of signature. These patent problems may need to be investigated.' Read the mails. The checksum-file is _download_ from the server and _not_ calculated from the client! http://www.sharemation.com/~milele/public/rsync-specification.htm (near bottom) the same... http://pserver.samba.org/cgi-bin/cvsweb/rproxy/doc/calu_paper/calu_paper.tex?annotate=1.1 the same... http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html the opposite point. maybe I don't understand some points... 3 times server generateted checksums are forbidden by partent and one time a client generateted checksums are forbidden... Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debsupport.de PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux GNU does not eliminate all the world's problems, only some of them. - Richard Stallman - The GNU Manifesto, 1985 pgpKim1jMD89C.pgp Description: PGP signature
Re: Debian's problems, Debian's future
On Tue, 9 Apr 2002, Michael Bramer wrote: - make the check on the client site and - download the file partly per ftp/http - make the new file with the old and downloaded parts With this the server need only extra rsync-checksum files. Rumor around rsync circles is that this is patented. Jason -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
Le mar 09/04/2002 à 20:13, Jason Gunthorpe a écrit : - make the check on the client site and - download the file partly per ftp/http - make the new file with the old and downloaded parts With this the server need only extra rsync-checksum files. Rumor around rsync circles is that this is patented. Then it is still possible to implement this on the mirrors outside the US. That would already save a lot of bandwidth... -- .''`. Josselin Mouette/\./\ : :' : [EMAIL PROTECTED] `. `' `- Debian GNU/Linux -- The power of freedom signature.asc Description: PGP signature
Re: Debian's problems, Debian's future
http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg00757.html Thanks for this pointer. My debiansynch script never runs into problem 1. rsync -r since it always does single file transfers. And for problem 2. rsync of near identical files it's not astonishing using a high cpu load for a short period, an ftp transfer simply distributes its cpu load over a longer period. O. Wyss -- Author of Debian partial mirror synch script (http://dpartialmirror.sourceforge.net/;) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Tue, Apr 09, 2002 at 05:02:34PM +0200, Michael Bramer wrote: you propose to add 'some' diff files for all files on ftp-master.d.o? With rsync we need only one rsync-checksum file per normal file and all apt's need only download the neededs parts. You get the point? With the standard rsync algorithm, the rsync checksum files would actually be 8 times larger than the original file (you need to store the checksum for each possible block in the file). What you are suggesting is that the server store checksums for precalculated blocks on the server. This would be 4 bytes per 1k in the original file or so. The transaction proceeds as follows: 1. Client asks for checksum list off server 2. Client calculates checksums for local file 3. Client compares list of server with list of client 4. Client downloads changed regions. Note, this is not the rsync algorithm, but the one that is possibly patented. -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Ignorance continues to thrive when intelligent people choose to do nothing. Speaking out against censorship and ignorance is the imperative of all intelligent people. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
* Jeroen Dekkers | It does also other things, like making distribution creation more | flexible. I'm thinking of having a some kind of package file for every | source package. That would include the current information and maybe a | lot more things like URL of upstream, license, etc. This file would be | stored in every package pool directory | (i.e. pool/main/f/foobar/Packages). | | Then we create a lot of bigger Packages files, only including the | packagename, version number and some other things which might be | useful (but not too much). Those bigger Packages files can be a lot | more flexible, for example we could have a different Package file for | different licenses, different upstream projects (gnome, kde, gnu, X, | etc), different use of machines (server, desktop), etc. (I know, old mail, but I am catching up) It seems like you want to put the control file outside the deb package and add more information to it. (And have apt-ftparchive not include all the information from the control file into the packages file.) Is this about correct? -- Tollef Fog Heen Unix _IS_ user friendly... It's just selective about who its friends are. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian's problems, Debian's future
On Sun, Apr 07, 2002 at 10:28:12PM +0200, Tollef Fog Heen wrote: * Jeroen Dekkers | It does also other things, like making distribution creation more | flexible. I'm thinking of having a some kind of package file for every | source package. That would include the current information and maybe a | lot more things like URL of upstream, license, etc. This file would be | stored in every package pool directory | (i.e. pool/main/f/foobar/Packages). | | Then we create a lot of bigger Packages files, only including the | packagename, version number and some other things which might be | useful (but not too much). Those bigger Packages files can be a lot | more flexible, for example we could have a different Package file for | different licenses, different upstream projects (gnome, kde, gnu, X, | etc), different use of machines (server, desktop), etc. (I know, old mail, but I am catching up) It seems like you want to put the control file outside the deb package and add more information to it. (And have apt-ftparchive not include all the information from the control file into the packages file.) Is this about correct? Yes, at least adding everything which is now in the normal Packages file. The normal Packages file would just be an index then. I think this is the best way to do the things I want. Jeroen Dekkers -- Jabber supporter - http://www.jabber.org Jabber ID: [EMAIL PROTECTED] Debian GNU supporter - http://www.debian.org http://www.gnu.org IRC: [EMAIL PROTECTED] pgpdUSNzhHMUs.pgp Description: PGP signature
Re: Debian's problems, Debian's future
Adam Majer wrote: On Wed, Mar 27, 2002 at 01:53:00PM +0100, Eduard Bloch wrote: 1) Large packages files [... 3 level idea ...] I would suggest a solution that is much easier to manage. That is, packages should be sorted according to the date that the package was modified. This could be accompished with adding Last Update field to Packages that would indicate when the package is installed. This way, we could implement a partial update for Packages by the server simply parsing the cream from the top of the milk :) This would make fetching Packages a lot faster. This would require a small CGI on the server that would support this type of fetch, but it could save a lot of bandwidth for the server and for the user. Here's how to make it possible without a CGI script; just support for fetching the last bit of a file. If you don't store the last update times in the Pacakges file, you can download just the last-update info, which should be a lot smaller than the packages file. Once you have this info, you know which part of the Packages file is the same as the one on the server. Then, you fetch from that point to the end of the server's file. You can make the dates file small by storing the dates only to the day accuracy, maybe as 32bit ints instead of text, or something. It should be pretty small after gzipping. (high accuracy dates aren't needed because not many packages are updated in a day, and downloading a few extra package descriptions is no problem.) I think this all works :) The only hard part is finding the right offset in a gzipped file given that you know how much of the begging of two uncompressed files match. --- #define X(x,y) x##y Peter Cordes ; e-mail: X([EMAIL PROTECTED] , ns.ca) The gods confound the man who first found out how to distinguish the hours! Confound him, too, who in this place set up a sundial, to cut and hack my day so wretchedly into small pieces! -- Plautus, 200 BCE -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]