[ 
https://issues.apache.org/jira/browse/SPARK-32632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Dinghua updated SPARK-32632:
--------------------------------
    Description: 
When i use the jdbc methed
{code:java}
def jdbc( url: String, table: String, columnName: String, lowerBound: Long, 
upperBound: Long, numPartitions: Int, connectionProperties: Properties)
{code}
 
  I am confused by the partitions generated by this method,  for the rows of 
first partition is not limited by the lowerBound and the ones of the last 
partition isn't limited by the upperBound. 
  
 For example, I use the method  as follow:
  
{code:java}
val data = spark.read.jdbc(url, table, "id", 2, 5, 3,buildProperties()) 
.selectExpr("id","appkey","funnel_name")
data.show(100, false)  
{code}
 

The result partitions info is :

 20/08/05 16:58:59 INFO JDBCRelation: Number of partitions: 3, WHERE clauses of 
these partitions: `id` < 3 or `id` is null, `id` >= 3 AND `id` < 4, `id` >= 4

The returned data is:
||id|| appkey||funnel_name||
|0|yanshi|test001|
|1|yanshi|test002|
|2|yanshi|test003|
|3|xingkong|test_funnel|
|4|xingkong|test_funnel2|
|5|xingkong|test_funnel3|
|6|donews|test_funnel4|
|7|donews|test_funnel|
|8|donews|test_funnel2|
|9|dami|test_funnel3|
|13|dami|test_funnel4|
|15|xiaoai|test_funnel6|

 

Normally, the clause of the first partition is " `id` >=2 and `id` < 3 "  
because the lowerBound is 2, and the clause of the last partition is " `id` >= 
4 and `id` < 5 ",  but the facts are not.

 

 
  

  was:
When i use the jdbc methed
{code:java}
def jdbc( url: String, table: String, columnName: String, lowerBound: Long, 
upperBound: Long, numPartitions: Int, connectionProperties: Properties)
{code}
 
  I am confused by the partitions generated by this method   for the rows of 
first partition is not limited by the lowerBound and the ones of the last 
partition isn't limited by the upperBound. 
  
 For example, I use the method  as follow:
  
{code:java}
val data = spark.read.jdbc(url, table, "id", 2, 5, 3,buildProperties()) 
.selectExpr("id","appkey","funnel_name")
data.show(100, false)  
{code}
 

The result partitions info is :

 20/08/05 16:58:59 INFO JDBCRelation: Number of partitions: 3, WHERE clauses of 
these partitions: `id` < 3 or `id` is null, `id` >= 3 AND `id` < 4, `id` >= 4

The returned data is:
||id|| appkey||funnel_name||
|0|yanshi|test001|
|1|yanshi|test002|
|2|yanshi|test003|
|3|xingkong|test_funnel|
|4|xingkong|test_funnel2|
|5|xingkong|test_funnel3|
|6|donews|test_funnel4|
|7|donews|test_funnel|
|8|donews|test_funnel2|
|9|dami|test_funnel3|
|13|dami|test_funnel4|
|15|xiaoai|test_funnel6|

 

Normally, the clause of the first partition is " `id` >=2 and `id` < 3 "  
because the lowerBound is 2, and the clause of the last partition is " `id` >= 
4 and `id` < 5 ",  but the facts are not.

 

 
  


> Bad partitioning in spark jdbc method with parameter lowerBound and upperBound
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-32632
>                 URL: https://issues.apache.org/jira/browse/SPARK-32632
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Liu Dinghua
>            Priority: Major
>
> When i use the jdbc methed
> {code:java}
> def jdbc( url: String, table: String, columnName: String, lowerBound: Long, 
> upperBound: Long, numPartitions: Int, connectionProperties: Properties)
> {code}
>  
>   I am confused by the partitions generated by this method,  for the rows of 
> first partition is not limited by the lowerBound and the ones of the last 
> partition isn't limited by the upperBound. 
>   
>  For example, I use the method  as follow:
>   
> {code:java}
> val data = spark.read.jdbc(url, table, "id", 2, 5, 3,buildProperties()) 
> .selectExpr("id","appkey","funnel_name")
> data.show(100, false)  
> {code}
>  
> The result partitions info is :
>  20/08/05 16:58:59 INFO JDBCRelation: Number of partitions: 3, WHERE clauses 
> of these partitions: `id` < 3 or `id` is null, `id` >= 3 AND `id` < 4, `id` 
> >= 4
> The returned data is:
> ||id|| appkey||funnel_name||
> |0|yanshi|test001|
> |1|yanshi|test002|
> |2|yanshi|test003|
> |3|xingkong|test_funnel|
> |4|xingkong|test_funnel2|
> |5|xingkong|test_funnel3|
> |6|donews|test_funnel4|
> |7|donews|test_funnel|
> |8|donews|test_funnel2|
> |9|dami|test_funnel3|
> |13|dami|test_funnel4|
> |15|xiaoai|test_funnel6|
>  
> Normally, the clause of the first partition is " `id` >=2 and `id` < 3 "  
> because the lowerBound is 2, and the clause of the last partition is " `id` 
> >= 4 and `id` < 5 ",  but the facts are not.
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to