[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset

gatorsmile Mon, 07 Aug 2017 14:21:06 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18861#discussion_r131766593
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
    @@ -746,8 +746,20 @@ abstract class RepartitionOperation extends UnaryNode {
      * [[RepartitionByExpression]] as this method is called directly by 
DataFrame's, because the user
      * asked for `coalesce` or `repartition`. [[RepartitionByExpression]] is 
used when the consumer
      * of the output requires some specific ordering or distribution of the 
data.
    + *
    + * If `shuffle` = false (`coalesce` cases), this logical plan can have an 
user-specified strategy
    + * to coalesce input partitions.
    + *
    + * @param numPartitions How many partitions to use in the output RDD
    + * @param shuffle Whether to shuffle when repartitioning
    + * @param child the LogicalPlan
    + * @param coalescer Optional coalescer that an user specifies
      */
    -case class Repartition(numPartitions: Int, shuffle: Boolean, child: 
LogicalPlan)
    +case class Repartition(
    +    numPartitions: Int,
    +    shuffle: Boolean,
    +    child: LogicalPlan,
    +    coalescer: Option[PartitionCoalescer] = None)
       extends RepartitionOperation {
       require(numPartitions > 0, s"Number of partitions ($numPartitions) must 
be positive.")
    --- End diff --
    
    Add a new require here? 
    ```
    require(!shuffle || coalescer.isEmpty, "Custom coalescer is not allowed for 
repartition(shuffle=true)")
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset

Reply via email to