Re: Spark SQL Parallelism - While reading from Oracle

2016-08-10 Thread @Sanjiv Singh
Use it 
You can set up all the properties (driver,partitionColumn, lowerBound,
upperBound, numPartitions) you should start with the driver at first.

Now you have the maximum id so you can use it for the upperBound parameter.
The numPartitions now based on your table's dimensions and your actual
system what you use. Now with this snippet you read a database table to a
dataframe with Spark.

df = sqlContext.read.format('jdbc').options(

url="jdbc:mysql://ip-address:3306/sometable?user=username&password=password",
dbtable=*sometable*,
driver="com.mysql.jdbc.Driver",
*partitionColumn*="id",
*lowerBound *= 1,
*upperBound *= maxId,
*numPartitions *= 100
).load()



Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Aug 10, 2016 at 6:35 AM, Siva A  wrote:

> Hi Team,
>
> How do we increase the parallelism in Spark SQL.
> In Spark Core, we can re-partition or pass extra arguments part of the
> transformation.
>
> I am trying the below example,
>
> val df1 = sqlContext.read.format("jdbc").options(Map(...)).load
> val df2= df1.cache
> val df2.count
>
> Here count operation using only one task. I couldn't increase the
> parallelism.
> Thanks in advance
>
> Thanks
> Siva
>


Spark SQL Parallelism - While reading from Oracle

2016-08-10 Thread Siva A
Hi Team,

How do we increase the parallelism in Spark SQL.
In Spark Core, we can re-partition or pass extra arguments part of the
transformation.

I am trying the below example,

val df1 = sqlContext.read.format("jdbc").options(Map(...)).load
val df2= df1.cache
val df2.count

Here count operation using only one task. I couldn't increase the
parallelism.
Thanks in advance

Thanks
Siva