Re: Reading AVRO from S3 - No parallelism

2016-10-27 Thread prithish
The Avro files were 500-600kb in size and that folder contained around 1200 files. The total folder size was around 600mb. Will try repartition. Thank you. > > On Oct 28, 2016 at 2:24 AM, (mailto:mich...@databricks.com)> wrote: > > > > How big are your

Re: Reading AVRO from S3 - No parallelism

2016-10-27 Thread Michael Armbrust
How big are your avro files? We collapse many small files into a single partition to eliminate scheduler overhead. If you need explicit parallelism you can also repartition. On Thu, Oct 27, 2016 at 5:19 AM, Prithish wrote: > I am trying to read a bunch of AVRO files from a

Reading AVRO from S3 - No parallelism

2016-10-27 Thread Prithish
I am trying to read a bunch of AVRO files from a S3 folder using Spark 2.0. No matter how many executors I use or what configuration changes I make, the cluster doesn't seem to use all the executors. I am using the com.databricks.spark.avro library from databricks to read the AVRO. However, if I