Re: [Pvfs2-users] parallel copying from 1 group of pvfs2-systems to 4 equal groups

Henk D. Schoneveld Thu, 12 Sep 2013 04:28:06 -0700

Did some testing and see some strange results. rsync being considerably slower 
then cp or tee


time rsync /Network/sata3/samsara.2011.720p-50fps.m4v /Network/sata4/2632

real    6m10.834s
user    0m8.299s
sys     0m12.368s

time cp /Network/sata3/samsara.2011.720p-50fps.m4v /Network/sata4/2632

real    3m43.190s
user    0m0.005s
sys     0m5.349s

time tee < /Network/sata3/samsara.2011.720p-50fps.m4v > /Network/sata4/2632 
/Network/sata2/2632

real    3m46.949s
user    0m0.176s
sys     0m20.299s

another rsync to show it's not a caching influence. On OSX Activity monitor in 
Network I do clearly see that Data sent is about the same as Data read with 
rsync while with cp and tee it's about double write speed against single speed 
read.

rm -f /Network/sata4/2632 /Network/sata2/2632;time rsync 
/Network/sata3/samsara.2011.720p-50fps.m4v /Network/sata4/2632

real    6m4.532s
user    0m8.138s
sys     0m11.395s
time rsync /Network/sata3/samsara.2011.720p-50fps.m4v /Network/sata2/2632

real    6m2.038s
user    0m8.142s
sys     0m11.478s

Source 100Mb/s rest are 3 1Gb/s systems. One system in the middle OSX, reads at 
12MB/s and writes at 24MB/s with cp and tee. With rsync it only writes at ~ 
11MB/s
Maybe it's OSX version of rsync's fault but very strange.

Henk

On Sep 11, 2013, at 1:45 AM, Henk D. Schoneveld <[email protected]> wrote:

> That should work, but as I understand it there will be 4 threads running. 
> Bandwith on source-server has to be shared between these threads.
> What I'm dreaming of is some kind of broad- multicasting to 4 ip-adress to 
> get max throughput.
> Maybe it's impossible but would be very efficient wouldn't it ?
> 
> Henk
> On Sep 10, 2013, at 9:09 PM, James Burton <[email protected]> wrote:
> 
>> Henk,
>> 
>> rsync will mostly do what you want it to do, but rsync doesn't support 
>> remote->remote copy.
>> 
>> The way I do multi-node copies involves using a recursive copy algorithm 
>> that uses ssh to run rsync on the remote machines. On each pass, every node 
>> that has source copies to a node that doesn't, which quickly copies the 
>> source to all the nodes in the list.
>> 
>> Here is the psuedocode. Of course, rsync and ssh need to be set up correctly 
>> on all the nodes and you have to be sure you are using the right syntax for 
>> your application, but this is a basic idea of what to do.
>> 
>> copyAll( nodes[] ):
>> 
>> # assume source is at node[0]
>> 
>> len = nodes.length()
>> 
>> if len == 1: return
>> 
>> # copy the source to node in the the middle of the list.
>> ssh user@node[0] "rsync -a /path/to/files user@node[len/2]:/path/to/files"
>> 
>> # partition the list and call recursively on separate threads
>> 
>> # this copies from node[0]->node[len/4]
>> thread(copyAll(nodes[0:len/2]))
>> 
>> # this copies from node[len/2]->node[3*len/4]
>> thread(copyAll(nodes[len/2:len]))
>> 
>> Hope that helps.
>> 
>> Jim
>> 
>> 
>> On Tue, Sep 10, 2013 at 12:36 PM, Henk D. Schoneveld <[email protected]> 
>> wrote:
>> 
>> On Sep 10, 2013, at 4:56 PM, James Burton <[email protected]> wrote:
>> 
>>> Henk,
>>> 
>>> I'm not sure what you are trying to do.
>>> 
>>> Are you looking to copy data from one server to a series of servers?
>> Yes
>>> Is this a one time copy for setup or will this be part of an ongoing system?
>> It will be part of an ongoing system.
>>> 
>>> Thanks,
>>> 
>>> Jim
>>> 
>>> 
>>> On Mon, Sep 9, 2013 at 4:25 PM, Henk D. Schoneveld <[email protected]> 
>>> wrote:
>>> Hi everybody,
>>> 
>>> I'm thinking about installing 5 groups of 30 pvfs2-systems in a 100Mb/s 
>>> WAN. The reason for this setup is that if one group would fail the 
>>> remaining 4 groups will be able to serve the original amount of intended 
>>> clients. IO-load would be 5/4 of the original setup.
>>> 
>>> All groups share one 5Gb/s connection to the internet.
>>> 
>>> To get minimal data transferred from the server somewhere on the internet 
>>> I'm thinking about following scenario. Copy a file with 30x100Mb/s = 3Gb/s 
>>> on 1 group and then parallel redistribute to the remaining groups.
>>> 
>>> Any ideas how to do this most efficiently ? I know tee < source > dest0 
>>> dest1 dest2 dest3 dest4 would do this but it's not recursively and 
>>> wildcards aren't accepted. rsync works with wildcards and recursively but 
>>> how to get it done parallel in a way that load on the source group is 
>>> minimal ?
>>> 
>>> Suggestions ver welcome
>>> 
>>> Henk
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>> 
>> 
>> 
> 
> 
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] parallel copying from 1 group of pvfs2-systems to 4 equal groups

Reply via email to