[jira] [Commented] (SPARK-34844) JDBCRelation columnPartition function includes the first stride in the lower partition

Jason Yarbrough (Jira) Wed, 24 Mar 2021 17:19:05 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-34844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308249#comment-17308249
 ]


Jason Yarbrough commented on SPARK-34844:
-----------------------------------------

As I coded up the unit tests and checked the effect this change has on other 
unit tests, it made me take a step back and reconsider which behavior really is 
more expected. My guess is that many people are not treating the bounds like a 
breakpoint, but instead they set the lower bound to the bottom of their data 
(or close to it), and not so much to the lowest percentile. I've also looked 
through some people's questions on stack overflow to get a some form of 
confirmation for this.

I'm going to set this to "Not A Problem" for now and will re-open if it makes 
sense after more testing.

> JDBCRelation columnPartition function includes the first stride in the lower 
> partition
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-34844
>                 URL: https://issues.apache.org/jira/browse/SPARK-34844
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Jason Yarbrough
>            Priority: Minor
>
> Currently, columnPartition in JDBCRelation contains logic that adds the first 
> stride into the lower partition. Because of this, the lower bound isn't used 
> as the ceiling for the lower partition.
> For example, say we have data 0-10, 10 partitions, and the lowerBound is set 
> to 1. The lower/first partition should contain anything < 1. However, in the 
> current implementation, it would include anything < 2.
> A possible easy fix would be changing the following code on line 132:
> currentValue += stride
> To:
> if (i != 0) currentValue += stride
> Or include currentValue += stride within the if statement on line 131... 
> although this creates a pretty bad looking side-effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34844) JDBCRelation columnPartition function includes the first stride in the lower partition

Reply via email to