Hello guys Q1: How does Spark determine the number of partitions when reading a Parquet file?
val df = sqlContext.parquetFile(path) Is it some way related to the number of Parquet row groups in my input? Q2: How can I reduce this number of partitions? Doing this: df.rdd.coalesce(200).count from the spark-shell causes job execution to hang… Any ideas? Thank you in advance. Eric --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org