[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

cloud-fan Sat, 08 Dec 2018 23:26:02 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23249#discussion_r240026485
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
    @@ -118,10 +115,12 @@ case class HashClusteredDistribution(
     
     /**
      * Represents data where tuples have been ordered according to the 
`ordering`
    - * [[Expression Expressions]].  This is a strictly stronger guarantee than
    - * [[ClusteredDistribution]] as an ordering will ensure that tuples that 
share the
    - * same value for the ordering expressions are contiguous and will never 
be split across
    - * partitions.
    + * [[Expression Expressions]]. Its requirement is defined as the following:
    + *   - Given any 2 adjacent partitions, all the rows of the second 
partition must be larger than or
    + *     equal to any row in the first partition, according to the 
`ordering` expressions.
    --- End diff --
    
    Note that, only sort requires `OrderedDistribution`, and global sort 
doesn't care if there are equal-rows across partitions.
    
    Here is a definition of the requirement. When designing protocols, it's 
important to make the requirement as weak as possible, and make guarantees as 
strong as possible.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

Reply via email to