Re: sync master with slaves with bittorrent?

2014-05-19 Thread Daniel Mahler
btw is there a command or script to update the slaves from the master? thanks Daniel On Mon, May 19, 2014 at 1:48 AM, Andrew Ash and...@andrewash.com wrote: If the codebase for Spark's broadcast is pretty self-contained, you could consider creating a small bootstrap sent out via the doubling

Re: sync master with slaves with bittorrent?

2014-05-19 Thread Daniel Mahler
On Mon, May 19, 2014 at 2:04 AM, Daniel Mahler dmah...@gmail.com wrote: I agree that for updating rsync is probably preferable, and it seems like for that purpose it would also parallelize well since most of the time is spent computing checksums so the process is not constrained by the total

Re: sync master with slaves with bittorrent?

2014-05-19 Thread Aaron Davidson
On the ec2 machines, you can update the slaves from the master using something like ~/spark-ec2/copy-dir ~/spark. Spark's TorrentBroadcast relies on the Block Manager to distribute blocks, making it relatively hard to extract. On Mon, May 19, 2014 at 12:36 AM, Daniel Mahler dmah...@gmail.com

Re: sync master with slaves with bittorrent?

2014-05-19 Thread Mosharaf Chowdhury
Good catch. In that case, using BitTornado/murder would be better. -- Mosharaf Chowdhury http://www.mosharaf.com/ On Mon, May 19, 2014 at 11:17 AM, Aaron Davidson ilike...@gmail.com wrote: On the ec2 machines, you can update the slaves from the master using something like

sync master with slaves with bittorrent?

2014-05-18 Thread Daniel Mahler
I am launching a rather large cluster on ec2. It seems like the launch is taking forever on Setting up spark RSYNC'ing /root/spark to slaves... ... It seems that bittorrent might be a faster way to replicate the sizeable spark directory to the slaves particularly if there is a lot of not

Re: sync master with slaves with bittorrent?

2014-05-18 Thread Aaron Davidson
Out of curiosity, do you have a library in mind that would make it easy to setup a bit torrent network and distribute files in an rsync (i.e., apply a diff to a tree, ideally) fashion? I'm not familiar with this space, but we do want to minimize the complexity of our standard ec2 launch scripts to

Re: sync master with slaves with bittorrent?

2014-05-18 Thread Daniel Mahler
I am not an expert in this space either. I thought the initial rsync during launch is really just a straight copy that did not need the tree diff. So it seemed like having the slaves do the copying among it each other would be better than having the master copy to everyone directly. That made me