Re: Reading AVRO from S3 - No parallelism

Michael Armbrust Thu, 27 Oct 2016 13:55:13 -0700

How big are your avro files?  We collapse many small files into a single
partition to eliminate scheduler overhead.  If you need explicit
parallelism you can also repartition.


On Thu, Oct 27, 2016 at 5:19 AM, Prithish <prith...@gmail.com> wrote:

> I am trying to read a bunch of AVRO files from a S3 folder using Spark
> 2.0. No matter how many executors I use or what configuration changes I
> make, the cluster doesn't seem to use all the executors. I am using the
> com.databricks.spark.avro library from databricks to read the AVRO.
>
> However, if I try the same on CSV files (same S3 folder, same
> configuration and cluster), it does use all executors.
>
> Is there something that I need to do to enable parallelism when using the
> AVRO databricks library?
>
> Thanks for your help.
>
>
>

Re: Reading AVRO from S3 - No parallelism

Reply via email to