Re: A success story with apt and rsync
> From time to time the question arises on different forums whether it is > possible to efficiently use rsync with apt-get. Recently there has been a > thread here on debian-devel and it was also mentioned in Debian Weekly News > June 24th, 2003. However, I only saw different small parts of a huge and > complex problem set discussed at different places, I haven't find an > overview of the whole situation anywhere. > Sorry that I write so late but I'm not reading debian-devel regularly. I've started a solution to distribute Debian mirrors by rsync about 2 years ago. The only "impact" (if impact is the right word) of my soultion on Debian is the use of the rsync patch for gzip. Everything else is solve by my perl script so you might find ideas for your apt solution there. See "http://dpartialmirror.sourceforge.net/";. O. Wyss -- See "http://wxguide.sourceforge.net/"; for ideas how to design your app.
Re: A success story with apt and rsync
Michael Karcher <[EMAIL PROTECTED]> writes: > On Sun, Jul 06, 2003 at 01:29:06AM +0200, Andrew Suffield wrote: > > It should put them in the package in the order they came from > > readdir(), which will depend on the filesystem. This is normally the > > order in which they were created, > As long as the file system uses an inefficient approach for directories like > the ext2/ext3 linked lists. If directories are hash tables (like on > reiserfs) even creating another file in the same directory may totally mess > up the order. > > Michael Karcher ext2/ext3 has hashed dirs too if you configure it. MfG Goswin
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 01:29:06AM +0200, Andrew Suffield wrote: > It should put them in the package in the order they came from > readdir(), which will depend on the filesystem. This is normally the > order in which they were created, As long as the file system uses an inefficient approach for directories like the ext2/ext3 linked lists. If directories are hash tables (like on reiserfs) even creating another file in the same directory may totally mess up the order. Michael Karcher
Re: A success story with apt and rsync
On Mon, Jul 07, 2003 at 01:01:34AM +0100, Andrew Suffield wrote: > > > > I believe htree == dir_index, so tune2fs(8) and mke2fs(8) have the answer. > > My /home has that enabled and readdir() returns files in creation order. > Then you don't have a htree-capable kernel or the directory isn't indexed. Directories that fit in a block are not indexed, as are directories larger than a block that were created before directory indexing was enabled, or if they were modified by a non-htree capable kernel. You can use the lsattr command to see if the indexed (I) flag is set on a particular directory: % lsattr -d /home/tytso --I-- /home/tytso - Ted
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 11:36:34PM +0100, Andrew Suffield wrote: > > I can only presume this is new or obscure, since everything I tried > had the traditional behaviour. Can't see how to turn it on, either. > It's new for 2.5. Backports to 2.4 are available here: http://thunk.org/tytso/linux/extfs-2.4-update/extfs-update-2.4.21 For those who are interested, the broken out patches can be found here: http://thunk.org/tytso/linux/extfs-2.4-update/broken-out-2.4.21/to-apply Once you have a htree-enabled kernel, you enable a filesystem to use the feature by using the following command: tune2fs -O dir_index /dev/hdXX Optionally, you can reorganize all of the directories to use btrees by using the command "e2fsck -fD /dev/hdXX". Otherwise, only directories that are expanded beyond a single block after you set the dir_index flag will use htrees. The dir_index is a fully compatible extension, so it's perfectly safe to mount a filesystem with htrees on a non-htree kernel. A non-htree kernel will just ignore the b-tree information, and if it attempts to modify a hash-tree directory, it will just invalidate the htree interior node information, so that the directory becomes unindexed until e2fsck -fD is run over the filesystem to which optmizes all of the directories by reindexing them all. Why would you want to use htrees? Because they speed up large directories. A lot. Try creating 400,000 zero-length files in a single directory. It will take under 30 seconds with htree enabled, and well over an hour without. > > The good news is that this particular optimization of sorting by inode > > number should work for all filesystems, and should speed up xfs as > > well as ext2/3 with HTREE. > > What about ext[23] without htree? Mucking with the order returned by > readdir() has historically caused problems there... It'll be fine; in fact, in some cases you'll see a slight speed up. The key is that you'll get the best performance by reading/modifying the inode data structures in sorted order by inode number. This way, you make a single sweep through the inode table, without needing any extraneous seeks. Using the natural sort order of readdir() on non-htree ext2/3 systems mostly approximated this --- although if files are deleted and created from the directory, this is not guaranteed. So sorting by inode number will never hurt, and may help. - Ted
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 07:28:09PM -0400, Matt Zimmerman wrote: > On Sun, Jul 06, 2003 at 11:36:34PM +0100, Andrew Suffield wrote: > > > On Sun, Jul 06, 2003 at 05:48:24PM -0400, Theodore Ts'o wrote: > > > Err, no. If the htree (hash tree) indexing feature is turned on for > > > ext2 or ext3 filesystems, they will returned sorted by the hash of the > > > filename --- effectively a random order. (Since the hash also > > > includes a secret, random, per-filesystem secret in order to avoid > > > denial of service attacks by malicious users who might otherwise try > > > to create huge numbers of files containing hash collisions.) > > > > I can only presume this is new or obscure, since everything I tried > > had the traditional behaviour. Can't see how to turn it on, either. > > I believe htree == dir_index, so tune2fs(8) and mke2fs(8) have the answer. My /home has that enabled and readdir() returns files in creation order. -- .''`. ** Debian GNU/Linux ** | Andrew Suffield : :' : http://www.debian.org/ | Dept. of Computing, `. `' | Imperial College, `- -><- | London, UK pgpNA07l53T5F.pgp Description: PGP signature
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 11:36:34PM +0100, Andrew Suffield wrote: > On Sun, Jul 06, 2003 at 05:48:24PM -0400, Theodore Ts'o wrote: > > Err, no. If the htree (hash tree) indexing feature is turned on for > > ext2 or ext3 filesystems, they will returned sorted by the hash of the > > filename --- effectively a random order. (Since the hash also > > includes a secret, random, per-filesystem secret in order to avoid > > denial of service attacks by malicious users who might otherwise try > > to create huge numbers of files containing hash collisions.) > > I can only presume this is new or obscure, since everything I tried > had the traditional behaviour. Can't see how to turn it on, either. I believe htree == dir_index, so tune2fs(8) and mke2fs(8) have the answer. -- - mdz
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 05:48:24PM -0400, Theodore Ts'o wrote: > On Sun, Jul 06, 2003 at 10:12:03PM +0100, Andrew Suffield wrote: > > On Sun, Jul 06, 2003 at 10:28:07PM +0200, Koblinger Egmont wrote: > > > Yes, when saying "random order" I obviously ment "in the order readdir() > > > returns them". It's random for me. :-))) > > > > > > It can easily be different on different filesystems, or even on same > > > type of filesystems with different parameters (e.g. blocksize). > > > > I can't think of any reason why changing the blocksize would affect > > this. Most filesystems return files in the sequence in which they were > > added to the directory. ext2, ext3, and reiser all do this; xfs is the > > only one likely to be used on a Debian system which doesn't. > > Err, no. If the htree (hash tree) indexing feature is turned on for > ext2 or ext3 filesystems, they will returned sorted by the hash of the > filename --- effectively a random order. (Since the hash also > includes a secret, random, per-filesystem secret in order to avoid > denial of service attacks by malicious users who might otherwise try > to create huge numbers of files containing hash collisions.) I can only presume this is new or obscure, since everything I tried had the traditional behaviour. Can't see how to turn it on, either. > I would be very, very surprised if reiserfs returned files in creation > order. Some trivial testing indicates that it does. Heck if I know how or why. > It is really, really bad assumption to assume that files will be > returned in the same order as they were created. However, there's no real need to - that was just an example. As long as the sequence is more or less stable (which it should be, for btrees; don't know about htree) then rsync won't be perturbed. > > On ext2, as an example, stat()ting or open()ing a directory of 1 > > files in the order returned by readdir() will be vastly quicker than > > in some other sequence (like, say, bytewise lexicographic) due to the > > way in which the filesystem looks up inodes. This has caused > > significant performance issues for bugs.debian.org in the past. > > If you are using HTREE, and want to do a readdir() scan followed by > something which opens or stat's all of the files, you very badly will > want to sort the returned directory inodes by the inode number > (de->d_inode). Otherwise, the order returned by readdir() will be > effectively random, with the resulting loss of performance which you > alluded to because the filesystem needs to randomly seek and ready all > around the inode table. Hmm, that's going to cause some trouble if htree becomes common. Is there any way to test for this at runtime? > The good news is that this particular optimization of sorting by inode > number should work for all filesystems, and should speed up xfs as > well as ext2/3 with HTREE. What about ext[23] without htree? Mucking with the order returned by readdir() has historically caused problems there... -- .''`. ** Debian GNU/Linux ** | Andrew Suffield : :' : http://www.debian.org/ | Dept. of Computing, `. `' | Imperial College, `- -><- | London, UK pgpbFXtT67wbT.pgp Description: PGP signature
Re: A success story with apt and rsync
On Sun, 6 Jul 2003, Andrew Suffield wrote: > On ext2, as an example, stat()ting or open()ing a directory of 1 > files in the order returned by readdir() will be vastly quicker than > in some other sequence (like, say, bytewise lexicographic) due to the > way in which the filesystem looks up inodes. This has caused > significant performance issues for bugs.debian.org in the past. You're right, I didn't get the point in the story when I simply ran find using the sortdir wrapper, but now I understand the problem. However I'm still unsure if this good to keep files unsorted, especially if we consider effective syncing of packages. On my home computer I've never heard the sound of my disk at package creating phase (even though we've beein using sortdir for more than a half year, and I've compiled hundreds of packages), but I hear it when e.g. the source is decompressed. At the 'dpkg-deb --build' phase only the processor is the bottleneck. This might vary under different circumstances. I'm unaware of them in case of Debian, e.g. I have no information about what hardware your packages are created on, whether there are any other cpu-intensive or disk-intensive applications running on these machines etc. I can easily imagine that using sortdir can drastically decrease performance if another disk-intensive process is running. However my experiences didn't show a noticeable performance decrease if this was the only process accessing the disk... But hey, let's stop for a minute :-) Building the package only uses the memory cache for most of the packages, doesn't it? The files it packs together have just recently been created and there are not so many packages whose uncompressed size is close to or bigger than the amount of RAM in today's machines... And for the large packages the build itself might take thousands as much time as reading the files in sorted order. Does anyone know what RPM does? I know that listing the contents of a package always produces alphabetical order but I don't know whether the filelist is sorted on the fly or the files really appear alphabetically in the cpio archive. So I guess we've already seen pros and cons of sorting the files. (One thing is missing: we still don't know how efficient rsync is if two rsyncable tar.gz files contain the same files but in different order.) The decision is clearly not mine but the Debian developers'. However, if you ask me, I still vote for sorting the files :-)) bye, Egmont
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 10:12:03PM +0100, Andrew Suffield wrote: > On Sun, Jul 06, 2003 at 10:28:07PM +0200, Koblinger Egmont wrote: > > Yes, when saying "random order" I obviously ment "in the order readdir() > > returns them". It's random for me. :-))) > > > > It can easily be different on different filesystems, or even on same > > type of filesystems with different parameters (e.g. blocksize). > > I can't think of any reason why changing the blocksize would affect > this. Most filesystems return files in the sequence in which they were > added to the directory. ext2, ext3, and reiser all do this; xfs is the > only one likely to be used on a Debian system which doesn't. Err, no. If the htree (hash tree) indexing feature is turned on for ext2 or ext3 filesystems, they will returned sorted by the hash of the filename --- effectively a random order. (Since the hash also includes a secret, random, per-filesystem secret in order to avoid denial of service attacks by malicious users who might otherwise try to create huge numbers of files containing hash collisions.) I would be very, very surprised if reiserfs returned files in creation order. The fundamental problem is that the readdir()/telldir()/seekdir() API is fundamentally busted. Yes, Dennis Ritchie and Ken Thompson do make mistakes, and have made many; in this particular case, they made a whopper. Seekdir()/telldir() assumes a linear directory structure which you can seek into, such that the results of readdir() are repeatable. Posix only allows files which are created or deleted in the interval to be undefined; all other files must be returned in the same order as the original readdir() stream, even if days or weeks elapse between the readdir(), telldir(), and seekdir() calls. Any filesystem which tries to use a B-tree like system, where leaf nodes can be split, is going to have extreme problems trying to keep these guarantees. For this reason, most filesystem designers choose to return files in b-tree order, and *not* the order in which files were added to the directory. It is really, really bad assumption to assume that files will be returned in the same order as they were created. > On ext2, as an example, stat()ting or open()ing a directory of 1 > files in the order returned by readdir() will be vastly quicker than > in some other sequence (like, say, bytewise lexicographic) due to the > way in which the filesystem looks up inodes. This has caused > significant performance issues for bugs.debian.org in the past. If you are using HTREE, and want to do a readdir() scan followed by something which opens or stat's all of the files, you very badly will want to sort the returned directory inodes by the inode number (de->d_inode). Otherwise, the order returned by readdir() will be effectively random, with the resulting loss of performance which you alluded to because the filesystem needs to randomly seek and ready all around the inode table. Why can't this be done in the kernel? Because if the directory is 200 megabytes, then kernel would need to allocate and hold on to 200 megabytes until the userspace called closedir(). There is simply no lightweight way to work around the problems caused by the broken API which Ken Thompson and Dennis Ritchie designed. The good news is that this particular optimization of sorting by inode number should work for all filesystems, and should speed up xfs as well as ext2/3 with HTREE. - Ted
Re: A success story with apt and rsync
Hi, On 6 Jul 2003, Goswin Brederlow wrote: > 2. most of the time you have no old file to rsync against. Only > mirrors will have an old file and they already use rsync. This is definitely true if you install your system from CD's and then upgrade it. However, if you keep on upgrading from testing/unstable then you'll have more and more packages under /var/cache/apt/archives so it will have more and more chance that an older version is found there. Or, alternatively, if you are sitting behind a slow modem and "apt-get upgrade" says it will upgrade "extremely-huge-package", then you can still easily insert your CD and copy the old version of "extremely-huge-package" to /var/cache/apt/archives and hit ENTER to apt-get afterwards. > 3. rsyncing against the previous version is only possible via some > dirty hack as apt module. apt would have to be changed to provide > modules access to its cache structure or at least pass any previous > version as argument. Some mirror scripts alreday use older versions as > templaes for new versions. Yes, this is what I've hacked together based on other people's great work. It is (as I've said too) a dirty hack. If a more experienced apt-coder can replace my hard-coded path with a mechanism that tells this path to the module, then this hack won't even be dirty. > 4. (and this is the knockout) rsync support for apt-get is NO > WANTED. rsync uses too much resources (cpu and more relevant IO) on > the server side and a widespread use of rsync for apt-get would choke > the rsync mirrors and do more harm than good. It might be no wanted for administrators, however, I guess it is wanted to many of the users (at least for me :-)) I don't see the huge load of the server (since I'm the only one rsyncing from it), but I see the huge difference in the download time. If my download wasn't faster because of an overloaded server, I would switch back to FTP or anything which is better to me as an end user. I understand that rsync causes a high load on the server when several users are connected, and so it is not suitable as a general replacement for ftp, however I think it is suitable as an alternative. I also don't expect the Debian team itself to set up a public rsync server for the packages. However, some mirrors might want to set up an rsync server either for the public or for example a university for its students. Similar hack could be simply used by people who have account to a machine with high bandwidth. For example if I used Debian and Debian had rsyncable packages, but no public rsync server was available, I'd personally mirror Debian to a machine at the university using FTP and would use rsync from that server to my home machine to save traffic where the bandwidth is a bottleneck. So I don't think it's a bad idea to set up some public rsync servers worldwide. The maximum number of connections can be set up so that cpu usage is limited somehow. It's obvious that if a user often gets the connection refused then he will switch back to ftp or http. Hence I guess that the power of the public rsync servers and the users using rsync would somehow be automatically balanced, it doesn't have to be coordinated centrally. So IMHO let anybody set up an rsync server if he wants to, and let the users use rsync if they want to (but don't put an rsync:// line in the default sources.list). > All together I think a extended bittorrent module for apt-get is by > far the better sollution but it will take some more time and designing > before it can be implemented. It is very promising and I really hope that it will be a good protocol with a good implementation and integration to apt. But until this is realized, we still could have rsync as an alternative, if Debian packages were packed in a slightly different way. bye, Egmont
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 10:28:07PM +0200, Koblinger Egmont wrote: > > On Sun, 6 Jul 2003, Andrew Suffield wrote: > > > It should put them in the package in the order they came from > > readdir(), which will depend on the filesystem. This is normally the > > order in which they were created, and should not vary when > > rebuilding. As such, sorting the list probably doesn't change the > > network traffic, but will slow dpkg-deb down on packages with large > > directories in them. > > Yes, when saying "random order" I obviously ment "in the order readdir() > returns them". It's random for me. :-))) > > It can easily be different on different filesystems, or even on same > type of filesystems with different parameters (e.g. blocksize). I can't think of any reason why changing the blocksize would affect this. Most filesystems return files in the sequence in which they were added to the directory. ext2, ext3, and reiser all do this; xfs is the only one likely to be used on a Debian system which doesn't. > I even think it can be different after a simple rebuild on exactly the > same environment. For example configure and libtool like to create files > with the PID in their name, which can take from 3 to 5 digits. If you > create the file X and then Y, remove X and then create Z then it is most > likely that if Z's name is shorter than or equal to the length of filename > X, then it will be returned first by readdir(), while if its name is > longer, then Y will be returned first and Z afterwards. So I can imagine > situations where the order of the files depend on the PIDs of the build > processes. This lengthly bit of handwaving has no connection with reality. > However, I think sorting the files costs really nothing. My system is not > a very new one, 375MHz Celeron, IDE disks, 384MB RAM etc... However: > > /usr/lib$ du -s . > 1,1G. > /usr/lib$ find . -type f | wc -l # okay, it's now in memory cache > 18598 > /usr/lib$ time find . >/dev/null 2>&1 > > real0m0.285s > user0m0.100s > sys 0m0.150s > [EMAIL PROTECTED]:/usr/lib$ time sortdir find . >/dev/null 2>&1 > > real0m1.683s > user0m1.390s > sys 0m0.250s > > > IMHO a step which takes one and a half seconds before compressing 18000 > files of more than 1 gigabytes shouldn't be a problem. This test only shows that you don't understand what is going on; it has no relation to the problems that can occur. On ext2, as an example, stat()ting or open()ing a directory of 1 files in the order returned by readdir() will be vastly quicker than in some other sequence (like, say, bytewise lexicographic) due to the way in which the filesystem looks up inodes. This has caused significant performance issues for bugs.debian.org in the past. -- .''`. ** Debian GNU/Linux ** | Andrew Suffield : :' : http://www.debian.org/ | Dept. of Computing, `. `' | Imperial College, `- -><- | London, UK pgp5SqrSYg0gQ.pgp Description: PGP signature
Re: A success story with apt and rsync
On Sun, 6 Jul 2003, Andrew Suffield wrote: > It should put them in the package in the order they came from > readdir(), which will depend on the filesystem. This is normally the > order in which they were created, and should not vary when > rebuilding. As such, sorting the list probably doesn't change the > network traffic, but will slow dpkg-deb down on packages with large > directories in them. Yes, when saying "random order" I obviously ment "in the order readdir() returns them". It's random for me. :-))) It can easily be different on different filesystems, or even on same type of filesystems with different parameters (e.g. blocksize). I even think it can be different after a simple rebuild on exactly the same environment. For example configure and libtool like to create files with the PID in their name, which can take from 3 to 5 digits. If you create the file X and then Y, remove X and then create Z then it is most likely that if Z's name is shorter than or equal to the length of filename X, then it will be returned first by readdir(), while if its name is longer, then Y will be returned first and Z afterwards. So I can imagine situations where the order of the files depend on the PIDs of the build processes. However, I guess or goal is not only to produce similar packages from exactly the same source. It's quite important to produce similar package even after a version upgrade. For example you have a foobar-0.9 package, and now upgrade to foobar-1.0. The author may have completely rewritten Makefile which yields in nearly the same executable, the same data files, but completely different "random" order. However, I think sorting the files costs really nothing. My system is not a very new one, 375MHz Celeron, IDE disks, 384MB RAM etc... However: /usr/lib$ du -s . 1,1G. /usr/lib$ find . -type f | wc -l # okay, it's now in memory cache 18598 /usr/lib$ time find . >/dev/null 2>&1 real0m0.285s user0m0.100s sys 0m0.150s [EMAIL PROTECTED]:/usr/lib$ time sortdir find . >/dev/null 2>&1 real0m1.683s user0m1.390s sys 0m0.250s IMHO a step which takes one and a half seconds before compressing 18000 files of more than 1 gigabytes shouldn't be a problem. cheers, Egmont
Re: A success story with apt and rsync
On Sun, Jul 06, 2003 at 12:37:00PM +1200, Corrin Lakeland wrote: > > 4. (and this is the knockout) rsync support for apt-get is NO > > WANTED. rsync uses too much resources (cpu and more relevant IO) on > > the server side and a widespread use of rsync for apt-get would choke > > the rsync mirrors and do more harm than good. > > When I was looking into this I heard about some work into caching the rolling > checksums to eliminate server load. I didn't find any code. That would be because the checksums would take at least 8 times the space of the original files. You need the backward-rsync which was patented last I heard. -- Martijn van Oosterhout http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington pgpMZ4tlXA1It.pgp Description: PGP signature
Re: A success story with apt and rsync
On Sun, 2003-07-06 at 09:27, Goswin Brederlow wrote: > 4. (and this is the knockout) rsync support for apt-get is NO > WANTED. rsync uses too much resources (cpu and more relevant IO) on > the server side and a widespread use of rsync for apt-get would choke > the rsync mirrors and do more harm than good. One way to alleviate this would be to only generate the deltas once on server-side when first requested, then cache them on disk to be served out like any other static file for reconstruction of the new package on the client-side using rsync. I've been thinking for a while about trying to build this into Apt-cacher. Jonathan
Re: A success story with apt and rsync
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sunday 06 July 2003 11:27, Goswin Brederlow wrote: > Koblinger Egmont <[EMAIL PROTECTED]> writes: > > Hi, > > > > >From time to time the question arises on different forums whether it is > > > > possible to efficiently use rsync with apt-get. Recently there has been a > > thread here on debian-devel and it was also mentioned in Debian Weekly > > News June 24th, 2003. However, I only saw different small parts of a huge > > and complex problem set discussed at different places, I haven't find an > > overview of the whole situation anywhere. > > ... > > Lets > summarize what I still remember: > > 2. most of the time you have no old file to rsync against. Only > mirrors will have an old file and they already use rsync. /var/cache/apt/ ? > 4. (and this is the knockout) rsync support for apt-get is NO > WANTED. rsync uses too much resources (cpu and more relevant IO) on > the server side and a widespread use of rsync for apt-get would choke > the rsync mirrors and do more harm than good. When I was looking into this I heard about some work into caching the rolling checksums to eliminate server load. I didn't find any code. > Doogie is thinking about extending the Bittorrent protocol for use as > apt-get method. I talked with him on irc about some design ideas and > so far it looks realy good if he can get some mirrors to host it. Sounds interesting. bittorrent allocates people to peer off in a round-robin fashon, which is really stupid. If two people have similar IPs they should make a better peer. > Via another small extension rolling > checksums for each block could be included in the protocol and a > client side rsync can be done. (I heard this variant of rsync would be > patented in US but never saw real proof of it.) Likewise on both counts. Corrin -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/B28si5A0ZsG8x8cRAuuoAJ9+wAEhoRcfBDsAtj96KHowqlM03QCffbF1 sl5I76+IzUdF2MavgDLJcls= =6X9X -END PGP SIGNATURE-
Re: A success story with apt and rsync
On 6 Jul 2003, Goswin Brederlow wrote: > Doogie is thinking about extending the Bittorrent protocol for use as > apt-get method. I talked with him on irc about some design ideas and > so far it looks realy good if he can get some mirrors to host it. My plans are to require no additional software to be installed on any server. This means all files will be pre-generated, and mirrored. This also means that a tracker won't be available on that particular mirror, but the block checksums will still be available. > The bittorrent protocol organises multiple downloaders so that they > also upload to each other and thereby reduces the traffic on the main > server. The extension of the protocol should also utilise http/ftp > mirrors as sources for the files thereby spreading the load over > multiple servers evenly. What this means is that clients will be able to fetch blocks from normal http and ftp mirrors. This will be used to start fetching data before connections have been opened with peers. > Bittorrent calculates a hash for each block of a file very similar to > what rsync needs to work. Via another small extension rolling > checksums for each block could be included in the protocol and a > client side rsync can be done. (I heard this variant of rsync would be > patented in US but never saw real proof of it.) > > All together I think a extended bittorrent module for apt-get is by > far the better sollution but it will take some more time and designing > before it can be implemented. Also, for better sharing, users will have the option of leaving a running server on their machines. Additionally, part of my work will include extensions to the tracker to support tracker peers, and tracker clusters. Another extension is which tracker to use. When fetching the .torrent meta-data, my client will attempt to contact a tracker on the server the .torrent resides on. If none is found, it'll fall back to the one encoded in the .torrent. This provides for localization of connections, and better latency.
Re: A success story with apt and rsync
Koblinger Egmont <[EMAIL PROTECTED]> writes: > Hi, > > >From time to time the question arises on different forums whether it is > possible to efficiently use rsync with apt-get. Recently there has been a > thread here on debian-devel and it was also mentioned in Debian Weekly News > June 24th, 2003. However, I only saw different small parts of a huge and > complex problem set discussed at different places, I haven't find an > overview of the whole situation anywhere. ... I worked on an rsync patch for apt-get some years ago and raised some design questions, some the same as you did in the deleted parts. Lets summarize what I still remember: 1. debs are gziped so any change (even change in time) results in a different gzip. The rsyncable patch for gzip helps a lot there. So lets consider that fixed. 2. most of the time you have no old file to rsync against. Only mirrors will have an old file and they already use rsync. 3. rsyncing against the previous version is only possible via some dirty hack as apt module. apt would have to be changed to provide modules access to its cache structure or at least pass any previous version as argument. Some mirror scripts alreday use older versions as templaes for new versions. 4. (and this is the knockout) rsync support for apt-get is NO WANTED. rsync uses too much resources (cpu and more relevant IO) on the server side and a widespread use of rsync for apt-get would choke the rsync mirrors and do more harm than good. > conclusion > -- > > The good news is that it is working perfectly. > > The bad news is that you can't hack it on your home computer as long as your > distribution doesn't provide rsync-friendly packages. Maybe one could set up > a public rsync server with high bandwidth that keeps syncing the official > packages and repacks them with rsync-friendly gzip/zlib and sorting the > files. There is a growing lobby to use gzip --rsyncable for debian packages per default. Its coming. So what can be done? Doogie is thinking about extending the Bittorrent protocol for use as apt-get method. I talked with him on irc about some design ideas and so far it looks realy good if he can get some mirrors to host it. The bittorrent protocol organises multiple downloaders so that they also upload to each other and thereby reduces the traffic on the main server. The extension of the protocol should also utilise http/ftp mirrors as sources for the files thereby spreading the load over multiple servers evenly. Bittorrent calculates a hash for each block of a file very similar to what rsync needs to work. Via another small extension rolling checksums for each block could be included in the protocol and a client side rsync can be done. (I heard this variant of rsync would be patented in US but never saw real proof of it.) All together I think a extended bittorrent module for apt-get is by far the better sollution but it will take some more time and designing before it can be implemented. MfG Goswin
Re: A success story with apt and rsync
On Sat, Jul 05, 2003 at 11:56:41PM +0200, Koblinger Egmont wrote: > order of files > > dpkg-deb puts the files in the .deb package in random order. I hate this > misfeature since it's hard to eye-grep anything from ``dpkg -L'' or F3 in > mc. We run ``dpkg-deb --build'' using the sortdir library ([4a], [4b]) which > makes the files appear in the package in alphabetical order. I don't know > how efficient rsync is if you split a file to some dozens or even hundreds > of parts and shuffle them, and then syncronize this one with the original > version. Anyway, I'm sure that sorting the files cannot hurt rsync, it can > only help. I only guess that it really does help a lot. It should put them in the package in the order they came from readdir(), which will depend on the filesystem. This is normally the order in which they were created, and should not vary when rebuilding. As such, sorting the list probably doesn't change the network traffic, but will slow dpkg-deb down on packages with large directories in them. -- .''`. ** Debian GNU/Linux ** | Andrew Suffield : :' : http://www.debian.org/ | Dept. of Computing, `. `' | Imperial College, `- -><- | London, UK pgpnwSkhK58UF.pgp Description: PGP signature