Error when saving as parquet to S3

2015-04-30 Thread cosmincatalin
After repartitioning a DataFrame in Spark 1.3.0 I get a .parquet exception when saving toAmazon's S3. The data that I try to write is 10G. logsForDate .repartition(10) .saveAsParquetFile(destination) // -- Exception here The exception I receive is: java.io.IOException: The file being

Disable partition discovery

2015-04-24 Thread cosmincatalin
How can one disable *Partition discovery* in *Spark 1.3.0 * when using *sqlContext.parquetFile*? Alternatively, is there a way to load /.parquet/ files without *Partition discovery*? - https://www.linkedin.com/in/cosmincatalinsanda -- View this message in context:

Loading lots of .parquet files in Spark 1.3.1 (Hadoop 2.4)

2015-04-22 Thread cosmincatalin
I am trying to read a few hundred .parquet files from S3 into an EMR cluster. The .parquet files are structured by date and have /_common_metadata/ in each of the folders (as well as /_metadata/).The *sqlContext.parquetFile* operation takes a very long time, opening for reading each of the