I got it working. It's much faster. If someone else wants to try it I: 1) Was already using the code from the Presto S3 Hadoop FileSystem implementation modified to sever it from the rest of the Presto codebase. 2) I extended it and overrode the method "keyFromPath" so that anytime the Path referred to a "_temporary" parquet file "part" it returned a "key" to the final location of the file. 3) I registered the filesystem through sparkContext.hadoopConfiguration by setting fs.s3.impl, fs.s3n.impl, and fs.s3a.impl.
I realize I'm risking a file corruption but it's WAAAAY faster than it was. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org