Re: Spark 1.6.1 DataFrame write to JDBC

2016-04-21 Thread Mich Talebzadeh
I would be surprised if Oracle cannot handle million row calculations, unless you are also using other data in Spark. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Spark 1.6.1 DataFrame write to JDBC

2016-04-21 Thread Jonathan Gray
I think I know understand what the problem is and it is, in some ways, to do with partitions and, in other ways, to do with memory. I now think that the database write was not the source of the problem (the problem being end-to-end performance). The application reads rows from a database, does

Re: Spark 1.6.1 DataFrame write to JDBC

2016-04-21 Thread Jonathan Gray
I tried increasing the batch size (1000 to 10,000 to 100,000) but it didn't appear to make any appreciable difference in my test case. In addition I had read in the Oracle JDBC documentation that batches should be set between 10 and 100 and anything out of that range was not advisable. However, I

Re: Spark 1.6.1 DataFrame write to JDBC

2016-04-21 Thread Michael Segel
How many partitions in your data set. Per the Spark DataFrameWritetr Java Doc: “ Saves the content of the DataFrame to a external database table via JDBC. In the case the table already exists in the external

Re: Spark 1.6.1 DataFrame write to JDBC

2016-04-21 Thread Mich Talebzadeh
What is the end database, Have you checked the performance of your query at the target? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Spark 1.6.1 DataFrame write to JDBC

2016-04-20 Thread Jörn Franke
Well it could also depend on the receiving database. You should also check the executors. Updating to the latest version of the JDBC driver and JDK8, if supported by JDBC driver, could help. > On 20 Apr 2016, at 00:14, Jonathan Gray wrote: > > Hi, > > I'm trying to

Re: Spark 1.6.1 DataFrame write to JDBC

2016-04-20 Thread Takeshi Yamamuro
Sorry to wrongly send message in mid. How about trying to increate 'batchsize` in a jdbc option to improve performance? // maropu On Thu, Apr 21, 2016 at 2:15 PM, Takeshi Yamamuro wrote: > Hi, > > How about trying to increate 'batchsize > > On Wed, Apr 20, 2016 at 7:14

Spark 1.6.1 DataFrame write to JDBC

2016-04-19 Thread Jonathan Gray
Hi, I'm trying to write ~60 million rows from a DataFrame to a database using JDBC using Spark 1.6.1, something similar to df.write().jdbc(...) The write seems to not be performing well. Profiling the application with a master of local[*] it appears there is not much socket write activity and