Reading Spark doc (https://spark.apache.org/docs/latest/sql-data-sources-parquet.html). It's not mentioned how to parallel read parquet file with SparkSession. Would --num-executors just work? Any additional parameters needed to be added to SparkSession as well?
Also if I want to parallel write data to database, would options 'numPartitions' and 'batchsize' enough to improve write performance? For example, mydf.format("jdbc"). option("driver", "org.postgresql.Driver"). option("url", url). option("dbtable", table_name). option("user", username). option("password", password). option("numPartitions", N) . option("batchsize", M) save From Spark website (https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#jdbc-to-other-databases), I only find these two parameters that would have impact on db write performance. I appreciate any suggestions.