Hi Eric.

Q1:
When I read parquet files, I've tested that Spark generates so many
partitions as parquet files exist in the path.

Q2:
To reduce the number of partitions you can use rdd.repartition(x), x=>
number of partitions. Depend on your case, repartition could be a heavy task


Regards.
Miguel.

On Tue, May 5, 2015 at 3:56 PM, Eric Eijkelenboom <
eric.eijkelenb...@gmail.com> wrote:

> Hello guys
>
> Q1: How does Spark determine the number of partitions when reading a
> Parquet file?
>
> val df = sqlContext.parquetFile(path)
>
> Is it some way related to the number of Parquet row groups in my input?
>
> Q2: How can I reduce this number of partitions? Doing this:
>
> df.rdd.coalesce(200).count
>
> from the spark-shell causes job execution to hang…
>
> Any ideas? Thank you in advance.
>
> Eric
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 


Saludos.
Miguel Ángel

Reply via email to