[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

cloud-fan Thu, 06 Dec 2018 20:04:23 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23249#discussion_r239690226
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
    @@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, 
IntegerType}
     
     /**
      * Specifies how tuples that share common expressions will be distributed 
when a query is executed
    - * in parallel on many machines.  Distribution can be used to refer to two 
distinct physical
    - * properties:
    - *  - Inter-node partitioning of data: In this case the distribution 
describes how tuples are
    - *    partitioned across physical machines in a cluster.  Knowing this 
property allows some
    - *    operators (e.g., Aggregate) to perform partition local operations 
instead of global ones.
    - *  - Intra-partition ordering of data: In this case the distribution 
describes guarantees made
    - *    about how tuples are distributed within a single partition.
    + * in parallel on many machines.
    + *
    + * Distribution here refers to inter-node partitioning of data:
    + *   The distribution describes how tuples are partitioned across physical 
machines in a cluster.
    + *   Knowing this property allows some operators (e.g., Aggregate) to 
perform partition local
    + *   operations instead of global ones.
      */
    --- End diff --
    
    for ordering, I think people can look at `OrderedDistribution`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

Reply via email to