If you’re using DataFrame API you can achieve that by simply using (or not) the “partitionBy” method on the DataFrameWriter:
val originalDf = …. val df1 = originalDf…. val df2 = originalDf… df1.write.partitionBy(”col1”).save(…) df2.write.save(…) From: Amir Gershman <am...@fb.com> Date: Tuesday, May 24, 2016 at 7:01 AM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Using HiveContext.set in multipul threads Hi, I have a DataFrame I compute from a long chain of transformations. I cache it, and then perform two additional transformations on it. I use two Futures - each Future will insert the content of one of the above Dataframe to a different hive table. One Future must SET hive.exec.dynamic.partition=true and the other must set it to false. How can I run both INSERT commands in parallel, but guarantee each runs with its own settings? If I don't use the same HiveContext then the initial long chain of transformations which I cache is not reusable between HiveContexts. If I use the same HiveContext, race conditions between threads my cause one INSERT to execute with the wrong config.