Xuefu Zhang created SPARK-3621:
----------------------------------

             Summary: Provide a way to broadcast an RDD (instead of just a 
variable made of the RDD) so that a job can access
                 Key: SPARK-3621
                 URL: https://issues.apache.org/jira/browse/SPARK-3621
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.1.0, 1.0.0
            Reporter: Xuefu Zhang


In some cases, such as Hive's way of doing map-side join, it would be benefcial 
to allow client program to broadcast RDDs rather than just variables made of 
these RDDs. Broadcasting a variable made of RDDs requires all RDD data be 
collected to the driver and that the variable be shipped to the cluster after 
being made. It would be more performing if driver just broadcasts the RDDs and 
uses the corresponding data in jobs (such building hashmaps at executors).

Tez has a broadcast edge which can ship data from previous stage to the next 
stage, which doesn't require driver side processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to