Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1452#discussion_r15137404
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
    @@ -89,62 +31,47 @@ private[spark] object ShuffleMapTask {
      * See [[org.apache.spark.scheduler.Task]] for more information.
      *
      * @param stageId id of the stage this task belongs to
    - * @param rdd the final RDD in this stage
    + * @param rddBinary broadcast version of of the serialized RDD
      * @param dep the ShuffleDependency
    - * @param _partitionId index of the number in the RDD
    + * @param partition partition of the RDD this task is associated with
      * @param locs preferred task execution locations for locality scheduling
      */
     private[spark] class ShuffleMapTask(
         stageId: Int,
    -    var rdd: RDD[_],
    +    var rddBinary: Broadcast[Array[Byte]],
         var dep: ShuffleDependency[_, _, _],
    -    _partitionId: Int,
    +    partition: Partition,
         @transient private var locs: Seq[TaskLocation])
    -  extends Task[MapStatus](stageId, _partitionId)
    -  with Externalizable
    -  with Logging {
    -
    -  protected def this() = this(0, null, null, 0, null)
    +  extends Task[MapStatus](stageId, partition.index) with Logging {
    +
    +  // TODO: Should we also broadcast the ShuffleDependency? For that we 
would need a place to
    --- End diff --
    
    Perhaps JIRA-ize this one too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to