On Mon, May 19, 2014 at 2:04 AM, Daniel Mahler <dmah...@gmail.com> wrote:

> I agree that for updating rsync is probably preferable, and it seems like
> for that purpose it would also parallelize well since most of the time is
> spent computing checksums so the process is not constrained by the total
> i/o capacity of the master. However it is a problem for the initial
> replication of the master to the slaves. If you are running on ec2 then the
> dollar overhead of launching is quadratic in the number of slaves. if you
> launch a 100 machine cluster you will wait a 100 minutes, but you will pay
> for 10000 machine minutes or 167 hours
> before anything useful starts to  happen.
>

Launch time does *not* increase linearly with number slaves as I thought I
was seeing.
It would still be nice to have a faster launch though.

cheers
Daniel



> On Mon, May 19, 2014 at 1:32 AM, Aaron Davidson <ilike...@gmail.com>wrote:
>
>> One issue with using Spark itself is that this rsync is required to get
>> Spark to work...
>>
>>  Also note that a similar strategy is used for *updating* the spark
>> cluster on ec2, where the "diff" aspect is much more important, as you
>> might only make a small change on the driver node (recompile or
>> reconfigure) and can get a fast sync.
>>
>>
>> On Sun, May 18, 2014 at 11:22 PM, Mosharaf Chowdhury <
>> mosharafka...@gmail.com> wrote:
>>
>>> What twitter calls murder, unless it has changed since then, is just a
>>> BitTornado wrapper. In 2011, We did some comparison on the performance of
>>> murder and the TorrentBroadcast we have right now for Spark's own broadcast
>>> (Section 7.1 in
>>> http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf).
>>> Spark's implementation was 4.5X faster than murder.
>>>
>>> The only issue with using TorrentBroadcast to deploy code/VM is writing
>>> a wrapper around it to read from disk, but it shouldn't be too complicated.
>>> If someone picks it up, I can give some pointers on how to proceed (I've
>>> thought about doing it myself forever, but never ended up actually taking
>>> the time; right now I don't have enough free cycles either)
>>>
>>> Otherwise, murder/BitTornado would be better than the current strategy
>>> we have.
>>>
>>> A third option would be to use rsync; but instead of rsync-ing to every
>>> slave from the master, one can simply rsync from the master first to one
>>> slave; then use the two sources (master and the first slave) to rsync to
>>> two more; then four and so on. Might be a simpler solution without many
>>> changes.
>>>
>>> --
>>> Mosharaf Chowdhury
>>> http://www.mosharaf.com/
>>>
>>>
>>> On Sun, May 18, 2014 at 11:07 PM, Andrew Ash <and...@andrewash.com>wrote:
>>>
>>>> My first thought would be to use libtorrent for this setup, and it
>>>> turns out that both Twitter and Facebook do code deploys with a bittorrent
>>>> setup.  Twitter even released their code as open source:
>>>>
>>>>
>>>> https://blog.twitter.com/2010/murder-fast-datacenter-code-deploys-using-bittorrent
>>>>
>>>>
>>>> http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/
>>>>
>>>>
>>>> On Sun, May 18, 2014 at 10:44 PM, Daniel Mahler <dmah...@gmail.com>wrote:
>>>>
>>>>> I am not an expert in this space either. I thought the initial rsync
>>>>> during launch is really just a straight copy that did not need the tree
>>>>> diff. So it seemed like having the slaves do the copying among it each
>>>>> other would be better than having the master copy to everyone directly.
>>>>> That made me think of bittorrent, though there may well be other systems
>>>>> that do this.
>>>>> From the launches I did today it seems that it is taking around 1
>>>>> minute per slave to launch a cluster, which can be a problem for clusters
>>>>> with 10s or 100s of slaves, particularly since on ec2  that time has to be
>>>>> paid for.
>>>>>
>>>>>
>>>>> On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson 
>>>>> <ilike...@gmail.com>wrote:
>>>>>
>>>>>> Out of curiosity, do you have a library in mind that would make it
>>>>>> easy to setup a bit torrent network and distribute files in an rsync 
>>>>>> (i.e.,
>>>>>> apply a diff to a tree, ideally) fashion? I'm not familiar with this 
>>>>>> space,
>>>>>> but we do want to minimize the complexity of our standard ec2 launch
>>>>>> scripts to reduce the chance of something breaking.
>>>>>>
>>>>>>
>>>>>> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler <dmah...@gmail.com>wrote:
>>>>>>
>>>>>>> I am launching a rather large cluster on ec2.
>>>>>>> It seems like the launch is taking forever on
>>>>>>> ....
>>>>>>> Setting up spark
>>>>>>> RSYNC'ing /root/spark to slaves...
>>>>>>> ...
>>>>>>>
>>>>>>> It seems that bittorrent might be a faster way to replicate
>>>>>>> the sizeable spark directory to the slaves
>>>>>>> particularly if there is a lot of not very powerful slaves.
>>>>>>>
>>>>>>> Just a thought ...
>>>>>>>
>>>>>>> cheers
>>>>>>> Daniel
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to