For tuning large transfers over gsissh, please see the documentation for
the HPN functionality (http://www.psc.edu/networking/projects/hpn-ssh/)
which is included in recent gsi_openssh releases.
Steve White wrote:
Gabriel,
Thanks, this seems to work.
Also, I asked around here, and one of our colleagues simply does this:
alias gsync="time rsync --progress -ave gsissh"
gsync file <ip-address>:
The file transfer protocol becomes important when the data set is really
huge. Now, if they have already been copied using gridftp, this trick
with rsync will usually be fine, so long as not too much is changed.
However, when the whole data set changes, you really want those multiple
streams.
So...maybe I'll make a feature request anyway.
On 28.04.08, Gabriel Mateescu wrote:
Hi again,
One of our users has expressed a need for a functionality in job
submission
(and/or file transfer) for which we have found no neat solution.
He wants something like "rsync", to keep local data on one grid resource
up to date *efficiently* with a "source" directory on another grid
resource.
It should transfer files if and only if the files have changed in the
source directory.
Can this be done now? Should we submit a feature request?
rsync can be combined with gsi-ssh as the mechanism to create
a connection between the local machine and the remote machine:
rsync --rsh="$GLOBUS_LOCATION/bin/gsissh -x" ...
Rsync handles both the task of creating the list of
files to update, and that of copying the files.
The challenge is, how to factor out the task of file
copying, in order to be able to use other data copying
mechanisms such as GridFTP. Rsync does not seem to provide
native support for plugging-in an external data transport
tool.
Regards,
Gabriel