[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

cloud-fan Thu, 06 Dec 2018 19:09:48 -0800

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r239684697

--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
---
@@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType,
IntegerType}

/**
* Specifies how tuples that share common expressions will be distributed
when a query is executed
- * in parallel on many machines. Distribution can be used to refer to two
distinct physical
- * properties:
- * - Inter-node partitioning of data: In this case the distribution
describes how tuples are
- * partitioned across physical machines in a cluster. Knowing this
property allows some
- * operators (e.g., Aggregate) to perform partition local operations
instead of global ones.
- * - Intra-partition ordering of data: In this case the distribution
describes guarantees made
- * about how tuples are distributed within a single partition.
+ * in parallel on many machines.
+ *
+ * Distribution here refers to inter-node partitioning of data:
+ * The distribution describes how tuples are partitioned across physical
machines in a cluster.
+ * Knowing this property allows some operators (e.g., Aggregate) to
perform partition local
+ * operations instead of global ones.
*/
--- End diff --

I intentionally remove everything about intra-partition, as we never
leverage it and no partitioning provides this property. Did I miss something?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

Reply via email to