subject:"Parallel read parquet file, write to postgresql"

Re: Parallel read parquet file, write to postgresql

2018-12-03 Thread Shahab Yunus

Hi James. --num-executors is use to control the number of parallel tasks (each per executors) running for your application. For reading and writing data in parallel data partitioning is employed. You can look here for quick intro how data partitioning work:

Parallel read parquet file, write to postgresql

2018-12-03 Thread James Starks

Reading Spark doc (https://spark.apache.org/docs/latest/sql-data-sources-parquet.html). It's not mentioned how to parallel read parquet file with SparkSession. Would --num-executors just work? Any additional parameters needed to be added to SparkSession as well? Also if I want to parallel