RE: Need more speed - CPIO

Kenneth Goodwin Tue, 27 May 2003 09:38:30 -0700

> | >  On Fri, 2003-05-23 at 10:07, Distribution Lists wrote:
> | >  > with some help I have CPIO backing up a system to a remote
> | >  tape drive
> | >  > across a 100MB switch. Using the following command
> | 
> | You can generally pick up some speed in these circumstances by
> | not competing with yourself on disk access. Split this into 
> two phases.
> | 
> | 1. - Generate a list of files to be dumped.
> | 
> | find / -depth ! -fstype proc > /ramdisk1/backup_list
> 
> Another thing worth trying, which is rather dependent upon your
> I/O patterns, is to put a buffer between the cpio and the rsh.
> 
> Suppose the cpio is writing nice big data chunks to the pipe 
> - thus it fills
> the pipe on every write (a pipe, internally, has a fixed 
> size, small, buffer).
>
I have not seen the linux pipe implementation source (yet)
but based on System V, Pipes are usually implemented using
the Mbufs in the kernel, the same buffers used for TCP/IP packet handling,
and usually the Max file size on a pipe is greater than 32K in
total buffering. Limit is usually a kernel parameter.
The whole process runs at CPU speed and the sender will only
block when the pipe gets full.
Based on this premise and past experience --
the following comments are made......
 
> On the premise that data comes of the disc drive faster than
> it goes across the network (generally true), so the activity goes:
>       cpio writes
>       the pipe fills
>       cpio blocks


not exactly - cpio blocks on it's next read from the disk drive
RSH may or may not be run by the scheduler during this period
and empty out all or part of the pipe
but in general there is a great deal of overlap and the pipeline
normally never fills unless the CPU resources are being strained.
Cpio does not stall if the PIPE max size is larger than the read
size being used by CPIO. Remember you are not just running these
two processes alone, oter programs can kick in at any time
for cpu or io access.

>       rsh reads the pipe, draining it
>       cpio unblocks, gathers more data
>       cpio writes to the pipe again and blocks on filling it
>       rsh writes data to the network
>       rsh reads more data
>       cpio unblocks
> and so on. This means that cpio stalls a lot of the time.
> 
> This:
> 
>       cpio .... | cat | rsh ...
> 
> puts a little extra buffering in the process, reducing the 
> stalls. There's
> actually a program called "buffer" around to let you do this more
> effectively (and efficiently - it forks and shares the buffer across
> the two instances), used thus:

This just adds an unnecessary middle man who consumes more
pipe and cpu resources and does not buy you much in this case.
especially since the real bottle neck here is the network
and it's 1500 byte packets. Thats where things slow down
because of fragmentation of the original 5120 or larger packets.
I am actually running his type of backup between two SUns (solaris)
and a 4 MM DDS-3 tape drive.  It;s this fragmentation at the network
layer that is really slowing things down. we have to reconstruct
the larger data block from the smaller tcp/ip packets
at the rate and size they actually arrive. Fragmentation Kills....

> 
>       cpio .... | buffer -m 1M | rsh ...
> 
> which used a 1 megabyte buffer. Very effective for getting closer to
> streaming behaviour.

This is great if the tape drive is on the same system. Volcopy
(AT&T System V Rel 4) also implemented double buffer IO
for disk dumps to locally attached tape drives. This just slows it down
if pipes are implemented correctly in the kernel.

> 
> I can send you the buffer program if you like - it's 
> extremely useful for
> this particular purpose.

What he needs is a version of the Double buffer io program running on
the slave side of the link where the tape drive is.
It has two cooperating processes that switch roles.
Both sides can read from the TCP/IP socket and write directly
to the tape drive (replace DD) in a specified block size.
While one reads from the socket, the other is writing to the
tape drive and then they switch roles. The "switchover"
communication is performed using a local two way pipe
between the "twin" processes. The performance is gained
becuase while one process is blocked on the tape write
the otehr continues to read from the socket. The other
major thing that helps is if this is all running
on a multi-processor system since you dont
get cpu bound. So he should potentially upgrade
to multiple cpu based servers if that is not the
current case and implement a double buffered io program
to replace the call to DD to handle the tape drive directly.

> 
> Cheers,
> -- 
> Cameron Simpson, DoD#743        [EMAIL PROTECTED]    
> http://www.zip.com.au/~cs/
> 


-- 
redhat-list mailing list
unsubscribe mailto:[EMAIL PROTECTED]
https://www.redhat.com/mailman/listinfo/redhat-list

RE: Need more speed - CPIO

Reply via email to