Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3121#discussion_r20400579 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1417,6 +1427,97 @@ class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging { */ object SparkContext extends Logging { + /** + * Lock that guards access to global variables that track SparkContext construction. + */ + private[spark] val SPARK_CONTEXT_CONSTRUCTOR_LOCK = new Object() + + /** + * Records the creation site of the active, fully-constructed SparkContext. If no SparkContext + * is active, then this is `None`. + * + * Access to this field is guarded by SPARK_CONTEXT_CONSTRUCTOR_LOCK + */ + private[spark] var activeContextCreationSite: Option[CallSite] = None + + /** + * Points to a partially-constructed SparkContext if some thread is in the SparkContext + * constructor, or `None` if no SparkContext is being constructed. + * + * Access to this field is guarded by SPARK_CONTEXT_CONSTRUCTOR_LOCK + */ + private[spark] var contextBeingConstructed: Option[SparkContext] = None + + /** + * Called to ensure that no other SparkContext is running in this JVM. + * + * Throws an exception if a running context is detected and logs a warning if another thread is + * constructing a SparkContext. This warning is necessary because the current locking scheme + * prevents us from reliably distinguishing between cases where another context is being + * constructed and cases where another constructor threw an exception. + */ + private def assertNoOtherContextIsRunning(sc: SparkContext, conf: SparkConf) { + SPARK_CONTEXT_CONSTRUCTOR_LOCK.synchronized { + contextBeingConstructed.foreach { otherContext => + if (otherContext ne sc) { + val warnMsg = "Another SparkContext is being constructed (or threw an exception in its" + + " constructor). This may indicate an error, since only one SparkContext may be" + + " running in this JVM (see SPARK-2243)." + logWarning(warnMsg) + } + + activeContextCreationSite.foreach { creationSite => + val errMsg = "Only one SparkContext may be running in this JVM (see SPARK-2243)." + + " To ignore this error, set spark.driver.allowMultipleContexts = true. " + + s"The currently running SparkContext was created at:\n${creationSite.longForm}" + val exception = new SparkException(errMsg) + if (conf.getBoolean("spark.driver.allowMultipleContexts", false)) { + logWarning("Multiple running SparkContexts detected in the same JVM!", exception) + } else { + throw exception + } + } + } + } + } + + /** + * Called at the beginning of the SparkContext constructor to ensure that no SparkContext is + * running. Throws an exception if a running context is detected and logs a warning if another + * thread is constructing a SparkContext. This warning is necessary because the current locking + * scheme prevents us from reliably distinguishing between cases where another context is being + * constructed and cases where another constructor threw an exception. + */ + private[spark] def markPartiallyConstructed(sc: SparkContext, conf: SparkConf) { --- End diff -- Regarding the `conf` parameter, take a look at how `markPartiallyConstructed` is called at the very top of SparkContext: at this point, `sc.conf` hasn't been set and we only have the `config` that's passed via the SparkContext constructor. I could make that `config` in the constructor into a `private val`, but that seems like it has a high potential for confusion. I could also move the call to `markPartiallyConstructed` further down in the SparkContext constructor, but that creates confusing implicit ordering dependencies. Maybe it's clearer to change the constructor to accept a `allowMultipleContexts` boolean and to populate it at the callsite from a SparkConf. I suppose that this has the disadvantage of requiring each caller to implement the logic for picking the default configuration value, but we could just extract that into a private local variable in SparkContext.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org