Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
HyukjinKwon closed pull request #46571: [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir URL: https://github.com/apache/spark/pull/46571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
HyukjinKwon commented on PR #46571: URL: https://github.com/apache/spark/pull/46571#issuecomment-2116375998 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1603972872 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1317,6 +1317,16 @@ package object config { s" be less than or equal to ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") .createWithDefault(64 * 1024 * 1024) + private[spark] val CHECKPOINT_DIR = +ConfigBuilder("spark.checkpoint.dir") + .doc( +"Equivalent with SparkContext.setCheckpointDir. If set, the path becomes" + + "the default directory for checkpointing. It can be overwritten by" + + "SparkContext.setCheckpointDir.") Review Comment: nit: ```suggestion "Set the default directory for checkpointing. It can be overwritten by " + "SparkContext.setCheckpointDir.") ``` (I missed this in my earlier pass, my bad) ## docs/configuration.md: ## @@ -1795,6 +1795,16 @@ Apart from these, the following properties are also available, and may be useful 0.6.0 + + spark.checkpoint.dir + (none) + +Equivalent with SparkContext.setCheckpointDir. If set, the path becomes +the default directory for checkpointing. It can be overwritten by +SparkContext.setCheckpointDir. Review Comment: Same as above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
HyukjinKwon commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1602450604 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def cleaner: Option[ContextCleaner] = _cleaner private[spark] var checkpointDir: Option[String] = None + config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir) Review Comment: and adding a test too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
HyukjinKwon commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1602449365 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def cleaner: Option[ContextCleaner] = _cleaner private[spark] var checkpointDir: Option[String] = None + config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir) Review Comment: Let me just put it there togehter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
zhengruifeng commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1602443672 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def cleaner: Option[ContextCleaner] = _cleaner private[spark] var checkpointDir: Option[String] = None + config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir) Review Comment: or probably make `checkpointDir` lazy? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1602415826 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def cleaner: Option[ContextCleaner] = _cleaner private[spark] var checkpointDir: Option[String] = None + config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir) Review Comment: We should add a test - this NPE would have surfaced there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1602414776 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def cleaner: Option[ContextCleaner] = _cleaner private[spark] var checkpointDir: Option[String] = None + config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir) Review Comment: We should do this only after `_hadoopConfiguration` has been initialized. Move it to after line [535 here](https://github.com/apache/spark/blob/f40ffe0f97a2e753878af8f8f47038272a1e1657/core/src/main/scala/org/apache/spark/SparkContext.scala#L535) ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1602414644 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def cleaner: Option[ContextCleaner] = _cleaner private[spark] var checkpointDir: Option[String] = None + config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir) Review Comment: We should do this only after `_hadoopConfiguration` has been initialized. Move it to after line [535 here](https://github.com/apache/spark/blob/f40ffe0f97a2e753878af8f8f47038272a1e1657/core/src/main/scala/org/apache/spark/SparkContext.scala#L535) ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1602414644 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def cleaner: Option[ContextCleaner] = _cleaner private[spark] var checkpointDir: Option[String] = None + config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir) Review Comment: We should do this only after `_hadoopConfiguration` has been initialized. Move it to after line [535 here](https://github.com/apache/spark/blob/f40ffe0f97a2e753878af8f8f47038272a1e1657/core/src/main/scala/org/apache/spark/SparkContext.scala#L535) ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
HyukjinKwon commented on PR #46571: URL: https://github.com/apache/spark/pull/46571#issuecomment-2113647596 > Also, I'm wondering if we want to show a proper warning or even to raise exceptions because this is a critical mistake. This one, I think it's fine. We already have similar configurations such as `spark.log.level` vs `setLogLevel`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]
dongjoon-hyun commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1601964897 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1317,6 +1317,14 @@ package object config { s" be less than or equal to ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") .createWithDefault(64 * 1024 * 1024) + private[spark] val CHECKPOINT_DIR = +ConfigBuilder("spark.checkpoint.dir") + .doc("Equivalent with SparkContext.setCheckpointDir. If set, the path becomes " + +"the directory for checkpointing.") Review Comment: Could you add this to `docs/configuration.md`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org