Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21299#discussion_r188665161 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -90,13 +92,42 @@ object SQLExecution { * thread from the original one, this method can be used to connect the Spark jobs in this action * with the known executionId, e.g., `BroadcastExchangeExec.relationFuture`. */ - def withExecutionId[T](sc: SparkContext, executionId: String)(body: => T): T = { + def withExecutionId[T](sparkSession: SparkSession, executionId: String)(body: => T): T = { + val sc = sparkSession.sparkContext val oldExecutionId = sc.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) + withSQLConfPropagated(sparkSession) { + try { + sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, executionId) + body + } finally { + sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId) + } + } + } + + def withSQLConfPropagated[T](sparkSession: SparkSession)(body: => T): T = { + // Set all the specified SQL configs to local properties, so that they can be available at + // the executor side. --- End diff -- Technically broadcast is faster than local properties if there are a lot of properties, but one problem is you need to carry the broadcast handler everywhere, which I don't think is applicable to `SQLConf.get`. BTW we currently have hundreds of SQL configs, even a user set all of them for a job, the overhead is low. I tried ``` sc.makeRDD(Seq(1,2,3)).collect 1.to(100).foreach(i => sc.setLocalProperty(i.toString * 10, i.toString * 10)) sc.makeRDD(Seq(1,2,3)).collect ``` and didn't observe performance difference.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org