Reading Spark doc 
(https://spark.apache.org/docs/latest/sql-data-sources-parquet.html). It's not 
mentioned how to parallel read parquet file with SparkSession. Would 
--num-executors just work? Any additional parameters needed to be added to 
SparkSession as well?

Also if I want to parallel write data to database, would options 
'numPartitions' and 'batchsize' enough to improve write performance? For 
example,

                 mydf.format("jdbc").
                     option("driver", "org.postgresql.Driver").
                     option("url", url).
                     option("dbtable", table_name).
                     option("user", username).
                     option("password", password).
                     option("numPartitions", N) .
                     option("batchsize", M)
                     save

From Spark website 
(https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#jdbc-to-other-databases),
 I only find these two parameters that would have impact  on db write 
performance.

I appreciate any suggestions.

Reply via email to