StephenZou created SPARK-31395:
----------------------------------

             Summary: [SPARK][Core]preferred location causing single node be a 
hot spot
                 Key: SPARK-31395
                 URL: https://issues.apache.org/jira/browse/SPARK-31395
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.4.5, 2.3.4
            Reporter: StephenZou


my job is run as follows:
 # build model, model is saved in HDFS named prob1. prob2. probN
 # then load it to RDD from certain ProbInputformat
 # do some calculation

The driver node which builds the model is scheduled more frequently than other 
nodes, because the HDFS block is firstly written to itself. The scheduling hot 
spot is unnecessary, and should better be flattened.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to