[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

maryannxue Thu, 06 Dec 2018 20:00:27 -0800

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239689874

--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
---
@@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType,
IntegerType}

/**
* Specifies how tuples that share common expressions will be distributed
when a query is executed
- * in parallel on many machines. Distribution can be used to refer to two
distinct physical
- * properties:
- * - Inter-node partitioning of data: In this case the distribution
describes how tuples are
- * partitioned across physical machines in a cluster. Knowing this
property allows some
- * operators (e.g., Aggregate) to perform partition local operations
instead of global ones.
- * - Intra-partition ordering of data: In this case the distribution
describes guarantees made
- * about how tuples are distributed within a single partition.
+ * in parallel on many machines.
+ *
+ * Distribution here refers to inter-node partitioning of data:
+ * The distribution describes how tuples are partitioned across physical
machines in a cluster.
+ * Knowing this property allows some operators (e.g., Aggregate) to
perform partition local
+ * operations instead of global ones.
*/
--- End diff --

Yes, I understand that partitioning has nothing to do with intra-partition
ordering at all. And it was wrong to include intra-partition ordering as part
of the distribution properties. But I was thinking mentioning ordering as a
side note would probably help ppl understand better how some operators work. Or
maybe here's not the best place to put it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

Reply via email to