Right now I know of three different things to pass property parameters to the Spark Context. They are:
- A) Inside a SparkConf object just before creating the Spark Context - B) During job submission (i.e. --conf spark.driver.memory.memory = 2g) - C) By using a specific property file during job submission (i.e. --properties-file somefile.conf) My question is: If you specify the same config parameter in more than one place *what is the one that will be actually used?* *Does the priority order is the same for any property or is property dependent?* *I am mostly interested in the config parameter spark.sql.shuffle.partitions, which I need to modify on the fly to do group by clauses depending on the size of my input.* Thanks -- Cesar Flores