Which strategy is used for broadcast variables?

2015-03-11 Thread Tom
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Which-strategy-is-used-for-broadcast-variables-tp22004.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

RE: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury
@spark.apache.org Subject: Which strategy is used for broadcast variables? In Performance and Scalability of Broadcast in Spark by Mosharaf Chowdhury I read that Spark uses HDFS for its broadcast variables. This seems highly inefficient. In the same paper alternatives are proposed, among which Bittorent

Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury
. Spark currently uses a BitTorrent like mechanism that's been tuned for datacenter environments. Mosharaf -- From: Tom thubregt...@gmail.com Sent: ‎3/‎11/‎2015 4:58 PM To: user@spark.apache.org Subject: Which strategy is used for broadcast variables? In Performance

Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Tom Hubregtsen
...@gmail.com Sent: ‎3/‎11/‎2015 4:58 PM To: user@spark.apache.org Subject: Which strategy is used for broadcast variables? In Performance and Scalability of Broadcast in Spark by Mosharaf Chowdhury I read that Spark uses HDFS for its broadcast variables. This seems highly inefficient

Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Tom Hubregtsen
/‎11/‎2015 4:58 PM To: user@spark.apache.org Subject: Which strategy is used for broadcast variables? In Performance and Scalability of Broadcast in Spark by Mosharaf Chowdhury I read that Spark uses HDFS for its broadcast variables. This seems highly inefficient. In the same paper alternatives