I'm looking for about how scale broadcast variables in Spark and what algorithm uses.
I have found http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf I don't know if they're talking about the current version (1.2.1) because the file was created in 2010. I took a look to the documentation and API and I read that there is an TorrentFactory for broadcast variable it's which it uses Spark right now? In the article they talk that Spark uses another one (Centralized HDFS Broadcast) How does it scale if I have a big cluster (about 300 nodes) the current algorithm?? is it linear? are there others options to choose others algorithms? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org