this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Which-strategy-is-used-for-broadcast-variables-tp22004.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user
@spark.apache.org
Subject: Which strategy is used for broadcast variables?
In Performance and Scalability of Broadcast in Spark by Mosharaf Chowdhury
I read that Spark uses HDFS for its broadcast variables. This seems highly
inefficient. In the same paper alternatives are proposed, among which
Bittorent
.
Spark currently uses a BitTorrent like mechanism that's been tuned for
datacenter environments.
Mosharaf
--
From: Tom thubregt...@gmail.com
Sent: 3/11/2015 4:58 PM
To: user@spark.apache.org
Subject: Which strategy is used for broadcast variables?
In Performance
...@gmail.com
Sent: 3/11/2015 4:58 PM
To: user@spark.apache.org
Subject: Which strategy is used for broadcast variables?
In Performance and Scalability of Broadcast in Spark by Mosharaf
Chowdhury
I read that Spark uses HDFS for its broadcast variables. This seems
highly
inefficient
/11/2015 4:58 PM
To: user@spark.apache.org
Subject: Which strategy is used for broadcast variables?
In Performance and Scalability of Broadcast in Spark by Mosharaf
Chowdhury
I read that Spark uses HDFS for its broadcast variables. This seems highly
inefficient. In the same paper alternatives