Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12157#discussion_r58572140 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -53,14 +53,22 @@ import org.apache.spark.util.{NextIterator, SerializableConfiguration, ShutdownH /** * A Spark split class that wraps around a Hadoop InputSplit. */ -private[spark] class HadoopPartition(rddId: Int, idx: Int, s: InputSplit) +private[spark] class HadoopPartition(rddId: Int, override val index: Int, s: InputSplit) extends Partition { val inputSplit = new SerializableWritable[InputSplit](s) - override def hashCode(): Int = 41 * (41 + rddId) + idx + override def hashCode(): Int = 41 * (41 + rddId) + index - override val index: Int = idx + def canEqual(other: Any): Boolean = other.isInstanceOf[HadoopPartition] + + override def equals(other: Any): Boolean = other match { + case that: HadoopPartition => + super.equals(that) && + (that canEqual this) && + index == that.index --- End diff -- It becomes a field if you use it outside the constructor, or should. It should stay private yes. This caused me to think about the partition changes a bit more. Defining `hashCode` without `equals` is technically correct. By default no two distinct objects are equal, so they can't violate the contract that equal objects have the same hash code no matter what the hash code is. Here it's not clear that two partitions with the same index are semantically equivalent, since they can be from different RDDs. So the RDD's ID matters, but, maybe it's not right to implement a notion of equality here either. It does raise the question -- when do partitions get hashed and why is a non-default implementation important then? it could be vestigial. Maybe best not to add equals though, and unless we know for sure hash code isn't important, leave that. So maybe we end up leaving the partition classes alone and weakening the condition to require hashCode if equals exists but not vice versa?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org