[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19942 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156697267 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = getTimeAsSeconds("spark.network.timeout", "120s") +val executorHeartbeatInterval = getTimeAsSeconds("spark.executor.heartbeatInterval", "10s") +// If spark.executor.heartbeatInterval bigger than spark.network.timeout, +// it will almost always cause ExecutorLostFailure. See SPARK-22754. +require(executorTimeoutThreshold > executorHeartbeatInterval, "The value of " + + s"'spark.network.timeout=${executorTimeoutThreshold}' must be no less than the value of " + --- End diff -- Agree with you @srowen Does new message make sense? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156685543 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = getTimeAsSeconds("spark.network.timeout", "120s") +val executorHeartbeatInterval = getTimeAsSeconds("spark.executor.heartbeatInterval", "10s") +// If spark.executor.heartbeatInterval bigger than spark.network.timeout, +// it will almost always cause ExecutorLostFailure. See SPARK-22754. +require(executorTimeoutThreshold > executorHeartbeatInterval, "The value of " + + s"'spark.network.timeout=${executorTimeoutThreshold}' must be no less than the value of " + --- End diff -- OK. Not sure about the purpose of the single quotes, and the message doesn't say it's in seconds. These aren't big deals, but if the point of this PR is a clear helpful message, this still needs to be better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156664176 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = getTimeAsSeconds("spark.network.timeout", "120s") +val executorHeartbeatInterval = getTimeAsSeconds("spark.executor.heartbeatInterval", "10s") +// If spark.executor.heartbeatInterval bigger than spark.network.timeout, +// it will almost always cause ExecutorLostFailure. See SPARK-22754. +require(executorTimeoutThreshold > executorHeartbeatInterval, s"The value of " + --- End diff -- No need for 's', and you're missing an open quote in the next line. You could print the value of these params inline too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156627509 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156627486 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +// If spark.executor.heartbeatInterval bigger than spark.network.timeout, +// it will almost always cause ExecutorLostFailure.See SPARK-22754. --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156466283 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +// If spark.executor.heartbeatInterval bigger than spark.network.timeout, +// it will almost always cause ExecutorLostFailure.See SPARK-22754. --- End diff -- Add space after `.`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156466106 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) --- End diff -- `getTimeAsSeconds`, also below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156272847 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of" + --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156272829 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of" + + s"spark.network.timeout' must be no less than the value of" + --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156272812 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of" + + s"spark.network.timeout' must be no less than the value of" + + s" 'spark.executor.heartbeatInterval'.") --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156270767 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of" + + s"spark.network.timeout' must be no less than the value of" + + s" 'spark.executor.heartbeatInterval'.") --- End diff -- Sorry for that,i will check more carefully! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156268681 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of" + --- End diff -- Should also add a simple and explicit comment to explain the reason why this check is needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156268623 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of" + + s"spark.network.timeout' must be no less than the value of" + + s" 'spark.executor.heartbeatInterval'.") --- End diff -- nit: The whitespace should always go to the end of the previous line instead of the beginning. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156268550 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of" + + s"spark.network.timeout' must be no less than the value of" + --- End diff -- nit: remove the 's' in this line and the following line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156268382 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,18 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +if (executorHeartbeatInterval > executorTimeoutThreshold) { --- End diff -- Done @jiangxb1987 Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156267891 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -564,6 +564,18 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || get(SASL_ENCRYPTION_ENABLED) require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED), s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling encryption.") + +val executorTimeoutThreshold = Utils.timeStringAsSeconds(get("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + get("spark.executor.heartbeatInterval", "10s")) +if (executorHeartbeatInterval > executorTimeoutThreshold) { --- End diff -- ```require(executorHeartbeatInterval > executorTimeoutThreshold, s"The value of 'spark.network.timeout' must be no less than the value of 'spark.executor.heartbeatInterval'.")``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156264833 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S if (proxyUser != null && principal != null) { SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") } + +val executorTimeoutThreshold = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s")) --- End diff -- Done @vanzin Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156251839 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S if (proxyUser != null && principal != null) { SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") } + +val executorTimeoutThreshold = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s")) --- End diff -- This is the wrong place for the check, see discussion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156250974 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S if (proxyUser != null && principal != null) { SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") } + +val executorTimeoutThreshold = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s")) --- End diff -- I have thought about add a config constant,but it will affect many other codes,so i simply change here. @srowen @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156177369 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S if (proxyUser != null && principal != null) { SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") } + +val executorTimeoutThreshold = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s")) --- End diff -- The best place is probably `SparkConf.validateSettings`. I don't see a way to not duplicate the defaults, though - they're already duplicated in a bunch of places. The right way would be to create a config constant, but that would be a much noisier change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156112470 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S if (proxyUser != null && principal != null) { SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") } + +val executorTimeoutThreshold = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s")) --- End diff -- OK true. This still seems like the wrong place to validate, as they're not arguments to spark-submit, and this requires duplicating default values. @vanzin where would you put this check? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156105135 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S if (proxyUser != null && principal != null) { SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") } + +val executorTimeoutThreshold = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s")) --- End diff -- Since some one may only set spark.executor.heartbeatInterval without setting spark.network.timeout.So i just get with default value.Does this make sense?Since we may not need to check if both are not set. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156069510 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S if (proxyUser != null && principal != null) { SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") } + +val executorTimeoutThreshold = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.network.timeout", "120s")) +val executorHeartbeatInterval = Utils.timeStringAsSeconds( + sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s")) --- End diff -- Rather than repeat a default, just perform the check if both are set. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...
GitHub user caneGuy opened a pull request: https://github.com/apache/spark/pull/19942 [SPARK-22754][DEPLOY] Check whether spark.executor.heartbeatInterval bigger⦠⦠than spark.network.timeout or not ## What changes were proposed in this pull request? If spark.executor.heartbeatInterval bigger than spark.network.timeout,it will almost always cause exception below. `Job aborted due to stage failure: Task 4763 in stage 3.0 failed 4 times, most recent failure: Lost task 4763.3 in stage 3.0 (TID 22383, executor id: 4761, host: c3-hadoop-prc-st2966.bj): ExecutorLostFailure (executor 4761 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 154022 ms` Since many users do not get that point.He will set spark.executor.heartbeatInterval incorrectly. This patch check this case when submit applications. ## How was this patch tested? Test in cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/caneGuy/spark zhoukang/check-heartbeat Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19942.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19942 commit 891a092ac3a95f32cb2f9e1c215b1c8324c98971 Author: zhoukang Date: 2017-12-11T12:48:32Z [SPARK][DEPLOY] Check whether spark.executor.heartbeatInterval bigger than spark.network.timeout or not --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org