In "Performance and Scalability of Broadcast in Spark" by Mosharaf Chowdhury
I read that Spark uses HDFS for its broadcast variables. This seems highly
inefficient. In the same paper alternatives are proposed, among which
"Bittorent Broadcast (BTB)". While studying "Learning Spark," page 105,
second paragraph about Broadcast Variables, I read " The value is sent to
each node only once, using an efficient, BitTorrent-like communication
mechanism." 

- Is the book talking about the proposed BTB from the paper? 

- Is this currently the default? 

- If not, what is?

Thanks,

Tom



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Which-strategy-is-used-for-broadcast-variables-tp22004.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to