> | > On Fri, 2003-05-23 at 10:07, Distribution Lists wrote: > | > > with some help I have CPIO backing up a system to a remote > | > tape drive > | > > across a 100MB switch. Using the following command > | > | You can generally pick up some speed in these circumstances by > | not competing with yourself on disk access. Split this into > two phases. > | > | 1. - Generate a list of files to be dumped. > | > | find / -depth ! -fstype proc > /ramdisk1/backup_list > > Another thing worth trying, which is rather dependent upon your > I/O patterns, is to put a buffer between the cpio and the rsh. > > Suppose the cpio is writing nice big data chunks to the pipe > - thus it fills > the pipe on every write (a pipe, internally, has a fixed > size, small, buffer). > I have not seen the linux pipe implementation source (yet) but based on System V, Pipes are usually implemented using the Mbufs in the kernel, the same buffers used for TCP/IP packet handling, and usually the Max file size on a pipe is greater than 32K in total buffering. Limit is usually a kernel parameter. The whole process runs at CPU speed and the sender will only block when the pipe gets full. Based on this premise and past experience -- the following comments are made...... > On the premise that data comes of the disc drive faster than > it goes across the network (generally true), so the activity goes: > cpio writes > the pipe fills > cpio blocks
not exactly - cpio blocks on it's next read from the disk drive RSH may or may not be run by the scheduler during this period and empty out all or part of the pipe but in general there is a great deal of overlap and the pipeline normally never fills unless the CPU resources are being strained. Cpio does not stall if the PIPE max size is larger than the read size being used by CPIO. Remember you are not just running these two processes alone, oter programs can kick in at any time for cpu or io access. > rsh reads the pipe, draining it > cpio unblocks, gathers more data > cpio writes to the pipe again and blocks on filling it > rsh writes data to the network > rsh reads more data > cpio unblocks > and so on. This means that cpio stalls a lot of the time. > > This: > > cpio .... | cat | rsh ... > > puts a little extra buffering in the process, reducing the > stalls. There's > actually a program called "buffer" around to let you do this more > effectively (and efficiently - it forks and shares the buffer across > the two instances), used thus: This just adds an unnecessary middle man who consumes more pipe and cpu resources and does not buy you much in this case. especially since the real bottle neck here is the network and it's 1500 byte packets. Thats where things slow down because of fragmentation of the original 5120 or larger packets. I am actually running his type of backup between two SUns (solaris) and a 4 MM DDS-3 tape drive. It;s this fragmentation at the network layer that is really slowing things down. we have to reconstruct the larger data block from the smaller tcp/ip packets at the rate and size they actually arrive. Fragmentation Kills.... > > cpio .... | buffer -m 1M | rsh ... > > which used a 1 megabyte buffer. Very effective for getting closer to > streaming behaviour. This is great if the tape drive is on the same system. Volcopy (AT&T System V Rel 4) also implemented double buffer IO for disk dumps to locally attached tape drives. This just slows it down if pipes are implemented correctly in the kernel. > > I can send you the buffer program if you like - it's > extremely useful for > this particular purpose. What he needs is a version of the Double buffer io program running on the slave side of the link where the tape drive is. It has two cooperating processes that switch roles. Both sides can read from the TCP/IP socket and write directly to the tape drive (replace DD) in a specified block size. While one reads from the socket, the other is writing to the tape drive and then they switch roles. The "switchover" communication is performed using a local two way pipe between the "twin" processes. The performance is gained becuase while one process is blocked on the tape write the otehr continues to read from the socket. The other major thing that helps is if this is all running on a multi-processor system since you dont get cpu bound. So he should potentially upgrade to multiple cpu based servers if that is not the current case and implement a double buffered io program to replace the call to DD to handle the tape drive directly. > > Cheers, > -- > Cameron Simpson, DoD#743 [EMAIL PROTECTED] > http://www.zip.com.au/~cs/ > -- redhat-list mailing list unsubscribe mailto:[EMAIL PROTECTED] https://www.redhat.com/mailman/listinfo/redhat-list