Hi All, Suppose I have a parquet file of 100 MB in HDFS & my HDFS block is 64MB, so I have 2 block of data.
When I do, *sqlContext.parquetFile("path")* followed by an action , two tasks are stared on two partitions. My intend is to read this 2 blocks in more partitions to fully utilize my cluster resources & increase parallelism. Is there a way to do so like in case of sc.textFile("path",*numberOfPartitions*). Please note, I don't want to do *repartition* as that would result in lot of shuffle. Thanks in advance. Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-file-increase-read-parallelism-tp22190.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org