I am not an expert in this space either. I thought the initial rsync during
launch is really just a straight copy that did not need the tree diff. So
it seemed like having the slaves do the copying among it each other would
be better than having the master copy to everyone directly. That made me
think of bittorrent, though there may well be other systems that do this.
>From the launches I did today it seems that it is taking around 1 minute
per slave to launch a cluster, which can be a problem for clusters with 10s
or 100s of slaves, particularly since on ec2  that time has to be paid for.


On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson <ilike...@gmail.com> wrote:

> Out of curiosity, do you have a library in mind that would make it easy to
> setup a bit torrent network and distribute files in an rsync (i.e., apply a
> diff to a tree, ideally) fashion? I'm not familiar with this space, but we
> do want to minimize the complexity of our standard ec2 launch scripts to
> reduce the chance of something breaking.
>
>
> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler <dmah...@gmail.com> wrote:
>
>> I am launching a rather large cluster on ec2.
>> It seems like the launch is taking forever on
>> ....
>> Setting up spark
>> RSYNC'ing /root/spark to slaves...
>> ...
>>
>> It seems that bittorrent might be a faster way to replicate
>> the sizeable spark directory to the slaves
>> particularly if there is a lot of not very powerful slaves.
>>
>> Just a thought ...
>>
>> cheers
>> Daniel
>>
>>
>

Reply via email to