How big are your avro files? We collapse many small files into a single partition to eliminate scheduler overhead. If you need explicit parallelism you can also repartition.
On Thu, Oct 27, 2016 at 5:19 AM, Prithish <prith...@gmail.com> wrote: > I am trying to read a bunch of AVRO files from a S3 folder using Spark > 2.0. No matter how many executors I use or what configuration changes I > make, the cluster doesn't seem to use all the executors. I am using the > com.databricks.spark.avro library from databricks to read the AVRO. > > However, if I try the same on CSV files (same S3 folder, same > configuration and cluster), it does use all executors. > > Is there something that I need to do to enable parallelism when using the > AVRO databricks library? > > Thanks for your help. > > >