After repartitioning a DataFrame in Spark 1.3.0 I get a .parquet exception
when saving toAmazon's S3. The data that I try to write is 10G.
logsForDate
.repartition(10)
.saveAsParquetFile(destination) // -- Exception here
The exception I receive is:
java.io.IOException: The file being
How can one disable *Partition discovery* in *Spark 1.3.0 * when using
*sqlContext.parquetFile*?
Alternatively, is there a way to load /.parquet/ files without *Partition
discovery*?
-
https://www.linkedin.com/in/cosmincatalinsanda
--
View this message in context:
I am trying to read a few hundred .parquet files from S3 into an EMR cluster.
The .parquet files are structured by date and have /_common_metadata/ in
each of the folders (as well as /_metadata/).The *sqlContext.parquetFile*
operation takes a very long time, opening for reading each of the