Hello, I think I may have jumped to the wrong conclusion about symlinks,
and I was able to get what I want working perfectly.
I added these two settings in my importer application:
sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs",
"false")
Hello,
I have a parquet dataset, partitioned by a column 'a'. I want to take
advantage
of Spark SQL's ability to filter to the partition when you filter on 'a'. I
also
want to periodically update individual partitions without disrupting any
jobs
that are querying the data.
The obvious solution