Re: Sanely updating parquet partitions.

2016-04-29 Thread Philip Weaver
Hello, I think I may have jumped to the wrong conclusion about symlinks, and I was able to get what I want working perfectly. I added these two settings in my importer application: sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")

Sanely updating parquet partitions.

2016-04-29 Thread Philip Weaver
Hello, I have a parquet dataset, partitioned by a column 'a'. I want to take advantage of Spark SQL's ability to filter to the partition when you filter on 'a'. I also want to periodically update individual partitions without disrupting any jobs that are querying the data. The obvious solution