What is the end database, Have you checked the performance of your query at the target?
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 19 April 2016 at 23:14, Jonathan Gray <jonny.g...@gmail.com> wrote: > Hi, > > I'm trying to write ~60 million rows from a DataFrame to a database using > JDBC using Spark 1.6.1, something similar to df.write().jdbc(...) > > The write seems to not be performing well. Profiling the application with > a master of local[*] it appears there is not much socket write activity and > also not much CPU. > > I would expect there to be an almost continuous block of socket write > activity showing up somewhere in the profile. > > I can see that the top hot method involves > apache.spark.unsafe.platform.CopyMemory all from calls within > JdbcUtils.savePartition(...). However, the CPU doesn't seem particularly > stressed so I'm guessing this isn't the cause of the problem. > > Is there any best practices or has anyone come across a case like this > before where a write to a database seems to perform poorly? > > Thanks, > Jon >