I also thought that it is an issue. After investigating it further, found
out this. https://issues.apache.org/jira/browse/SPARK-6800
Here is the updated documentation of
*org.apache.spark.sql.jdbc.JDBCRelation#columnPartition* method
Notice that lowerBound and upperBound are just used to decide the
partition stride, not for filtering the rows in table. So all rows in the
table will be partitioned and returned.
So filter has to be added manually in the query.
val jdbcDF = sqlContext.jdbc(url =
jdbc:postgresql://localhost:5430/dbname?user=userpassword=111, table =
(select * from se_staging.exp_table3 where cs_id = 1 and cs_id = 1)
as exp_table3new ,columnName=cs_id,lowerBound=1 ,upperBound = 1,
numPartitions=12 )
Best Regards,
Sujeevan. N
On Mon, Mar 23, 2015 at 4:02 AM, Ted Yu yuzhih...@gmail.com wrote:
I went over JDBCRelation#columnPartition() but didn't find obvious clue
(you can add more logging to confirm that the partitions were generated
correctly).
Looks like the issue may be somewhere else.
Cheers
On Sun, Mar 22, 2015 at 12:47 PM, Marek Wiewiorka
marek.wiewio...@gmail.com wrote:
...I even tried setting upper/lower bounds to the same value like 1 or 10
with the same result.
cs_id is a column of the cardinality ~5*10^6
So this is not the case here.
Regards,
Marek
2015-03-22 20:30 GMT+01:00 Ted Yu yuzhih...@gmail.com:
From javadoc of JDBCRelation#columnPartition():
* Given a partitioning schematic (a column of integral type, a number
of
* partitions, and upper and lower bounds on the column's value),
generate
In your example, 1 and 1 are for the value of cs_id column.
Looks like all the values in that column fall within the range of 1 and
1000.
Cheers
On Sun, Mar 22, 2015 at 8:44 AM, Marek Wiewiorka
marek.wiewio...@gmail.com wrote:
Hi All - I try to use the new SQLContext API for populating DataFrame
from jdbc data source.
like this:
val jdbcDF = sqlContext.jdbc(url =
jdbc:postgresql://localhost:5430/dbname?user=userpassword=111, table =
se_staging.exp_table3 ,columnName=cs_id,lowerBound=1 ,upperBound =
1, numPartitions=12 )
No matter how I set lower and upper bounds I always get all the rows
from my table.
The API is marked as experimental so I assume there might by some bugs
in it but
did anybody come across a similar issue?
Thanks!