Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub


HyukjinKwon closed pull request #46571: [SPARK-48268][CORE] Add a configuration 
for SparkContext.setCheckpointDir
URL: https://github.com/apache/spark/pull/46571


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub


HyukjinKwon commented on PR #46571:
URL: https://github.com/apache/spark/pull/46571#issuecomment-2116375998

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-16 Thread via GitHub


mridulm commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1603972872


##
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##
@@ -1317,6 +1317,16 @@ package object config {
   s" be less than or equal to 
${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
   .createWithDefault(64 * 1024 * 1024)
 
+  private[spark] val CHECKPOINT_DIR =
+ConfigBuilder("spark.checkpoint.dir")
+  .doc(
+"Equivalent with SparkContext.setCheckpointDir. If set, the path 
becomes" +
+  "the default directory for checkpointing. It can be overwritten by" +
+  "SparkContext.setCheckpointDir.")

Review Comment:
   nit:
```suggestion
   "Set the default directory for checkpointing. It can be overwritten 
by " +
 "SparkContext.setCheckpointDir.")
   ```
   
   (I missed this in my earlier pass, my bad)



##
docs/configuration.md:
##
@@ -1795,6 +1795,16 @@ Apart from these, the following properties are also 
available, and may be useful
   
   0.6.0
 
+
+  spark.checkpoint.dir
+  (none)
+  
+Equivalent with SparkContext.setCheckpointDir. If set, the path becomes
+the default directory for checkpointing. It can be overwritten by
+SparkContext.setCheckpointDir.

Review Comment:
   Same as above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


HyukjinKwon commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1602450604


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging {
   private[spark] def cleaner: Option[ContextCleaner] = _cleaner
 
   private[spark] var checkpointDir: Option[String] = None
+  config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir)

Review Comment:
   and adding a test too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


HyukjinKwon commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1602449365


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging {
   private[spark] def cleaner: Option[ContextCleaner] = _cleaner
 
   private[spark] var checkpointDir: Option[String] = None
+  config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir)

Review Comment:
   Let me just put it there togehter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


zhengruifeng commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1602443672


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging {
   private[spark] def cleaner: Option[ContextCleaner] = _cleaner
 
   private[spark] var checkpointDir: Option[String] = None
+  config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir)

Review Comment:
   or probably make `checkpointDir` lazy?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


mridulm commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1602415826


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging {
   private[spark] def cleaner: Option[ContextCleaner] = _cleaner
 
   private[spark] var checkpointDir: Option[String] = None
+  config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir)

Review Comment:
   We should add a test - this NPE would have surfaced there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


mridulm commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1602414776


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging {
   private[spark] def cleaner: Option[ContextCleaner] = _cleaner
 
   private[spark] var checkpointDir: Option[String] = None
+  config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir)

Review Comment:
   We should do this only after `_hadoopConfiguration` has been initialized.
   Move it to after line [535 
here](https://github.com/apache/spark/blob/f40ffe0f97a2e753878af8f8f47038272a1e1657/core/src/main/scala/org/apache/spark/SparkContext.scala#L535)
 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


mridulm commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1602414644


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging {
   private[spark] def cleaner: Option[ContextCleaner] = _cleaner
 
   private[spark] var checkpointDir: Option[String] = None
+  config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir)

Review Comment:
   We should do this only after `_hadoopConfiguration` has been initialized.
   Move it to after line [535 
here](https://github.com/apache/spark/blob/f40ffe0f97a2e753878af8f8f47038272a1e1657/core/src/main/scala/org/apache/spark/SparkContext.scala#L535)
 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


mridulm commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1602414644


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -374,6 +374,7 @@ class SparkContext(config: SparkConf) extends Logging {
   private[spark] def cleaner: Option[ContextCleaner] = _cleaner
 
   private[spark] var checkpointDir: Option[String] = None
+  config.getOption(CHECKPOINT_DIR.key).foreach(setCheckpointDir)

Review Comment:
   We should do this only after `_hadoopConfiguration` has been initialized.
   Move it to after line [535 
here](https://github.com/apache/spark/blob/f40ffe0f97a2e753878af8f8f47038272a1e1657/core/src/main/scala/org/apache/spark/SparkContext.scala#L535)
 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


HyukjinKwon commented on PR #46571:
URL: https://github.com/apache/spark/pull/46571#issuecomment-2113647596

   > Also, I'm wondering if we want to show a proper warning or even to raise 
exceptions because this is a critical mistake.
   
   This one, I think it's fine. We already have similar configurations such as 
`spark.log.level` vs `setLogLevel`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-15 Thread via GitHub


dongjoon-hyun commented on code in PR #46571:
URL: https://github.com/apache/spark/pull/46571#discussion_r1601964897


##
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##
@@ -1317,6 +1317,14 @@ package object config {
   s" be less than or equal to 
${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
   .createWithDefault(64 * 1024 * 1024)
 
+  private[spark] val CHECKPOINT_DIR =
+ConfigBuilder("spark.checkpoint.dir")
+  .doc("Equivalent with SparkContext.setCheckpointDir. If set, the path 
becomes " +
+"the directory for checkpointing.")

Review Comment:
   Could you add this to `docs/configuration.md`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org