[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19942


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156697267
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
getTimeAsSeconds("spark.network.timeout", "120s")
+val executorHeartbeatInterval = 
getTimeAsSeconds("spark.executor.heartbeatInterval", "10s")
+// If spark.executor.heartbeatInterval bigger than 
spark.network.timeout,
+// it will almost always cause ExecutorLostFailure. See SPARK-22754.
+require(executorTimeoutThreshold > executorHeartbeatInterval, "The 
value of " +
+  s"'spark.network.timeout=${executorTimeoutThreshold}' must be no 
less than the value of " +
--- End diff --

Agree with you @srowen Does new message make sense?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156685543
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
getTimeAsSeconds("spark.network.timeout", "120s")
+val executorHeartbeatInterval = 
getTimeAsSeconds("spark.executor.heartbeatInterval", "10s")
+// If spark.executor.heartbeatInterval bigger than 
spark.network.timeout,
+// it will almost always cause ExecutorLostFailure. See SPARK-22754.
+require(executorTimeoutThreshold > executorHeartbeatInterval, "The 
value of " +
+  s"'spark.network.timeout=${executorTimeoutThreshold}' must be no 
less than the value of " +
--- End diff --

OK. Not sure about the purpose of the single quotes, and the message 
doesn't say it's in seconds. These aren't big deals, but if the point of this 
PR is a clear helpful message, this still needs to be better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156664176
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,14 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
getTimeAsSeconds("spark.network.timeout", "120s")
+val executorHeartbeatInterval = 
getTimeAsSeconds("spark.executor.heartbeatInterval", "10s")
+// If spark.executor.heartbeatInterval bigger than 
spark.network.timeout,
+// it will almost always cause ExecutorLostFailure. See SPARK-22754.
+require(executorTimeoutThreshold > executorHeartbeatInterval, s"The 
value of " +
--- End diff --

No need for 's', and you're missing an open quote in the next line. You 
could print the value of these params inline too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156627509
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-13 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156627486
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+// If spark.executor.heartbeatInterval bigger than 
spark.network.timeout,
+// it will almost always cause ExecutorLostFailure.See SPARK-22754.
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156466283
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+// If spark.executor.heartbeatInterval bigger than 
spark.network.timeout,
+// it will almost always cause ExecutorLostFailure.See SPARK-22754.
--- End diff --

Add space after `.`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156466106
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,15 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
--- End diff --

`getTimeAsSeconds`, also below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156272847
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of" +
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156272829
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of" +
+  s"spark.network.timeout' must be no less than the value of" +
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156272812
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of" +
+  s"spark.network.timeout' must be no less than the value of" +
+  s" 'spark.executor.heartbeatInterval'.")
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156270767
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of" +
+  s"spark.network.timeout' must be no less than the value of" +
+  s" 'spark.executor.heartbeatInterval'.")
--- End diff --

Sorry for that,i will check more carefully!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156268681
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of" +
--- End diff --

Should also add a simple and explicit comment to explain the reason why 
this check is needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156268623
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of" +
+  s"spark.network.timeout' must be no less than the value of" +
+  s" 'spark.executor.heartbeatInterval'.")
--- End diff --

nit: The whitespace should always go to the end of the previous line 
instead of the beginning.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156268550
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,13 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of" +
+  s"spark.network.timeout' must be no less than the value of" +
--- End diff --

nit: remove the 's' in this line and the following line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156268382
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,18 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+if (executorHeartbeatInterval > executorTimeoutThreshold) {
--- End diff --

Done @jiangxb1987  Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156267891
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -564,6 +564,18 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
 val encryptionEnabled = get(NETWORK_ENCRYPTION_ENABLED) || 
get(SASL_ENCRYPTION_ENABLED)
 require(!encryptionEnabled || get(NETWORK_AUTH_ENABLED),
   s"${NETWORK_AUTH_ENABLED.key} must be enabled when enabling 
encryption.")
+
+val executorTimeoutThreshold = 
Utils.timeStringAsSeconds(get("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  get("spark.executor.heartbeatInterval", "10s"))
+if (executorHeartbeatInterval > executorTimeoutThreshold) {
--- End diff --

```require(executorHeartbeatInterval > executorTimeoutThreshold, s"The 
value of 'spark.network.timeout' must be no less than the value of 
'spark.executor.heartbeatInterval'.")```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156264833
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (proxyUser != null && principal != null) {
   SparkSubmit.printErrorAndExit("Only one of --proxy-user or 
--principal can be provided.")
 }
+
+val executorTimeoutThreshold = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s"))
--- End diff --

Done @vanzin Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156251839
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (proxyUser != null && principal != null) {
   SparkSubmit.printErrorAndExit("Only one of --proxy-user or 
--principal can be provided.")
 }
+
+val executorTimeoutThreshold = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s"))
--- End diff --

This is the wrong place for the check, see discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156250974
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (proxyUser != null && principal != null) {
   SparkSubmit.printErrorAndExit("Only one of --proxy-user or 
--principal can be provided.")
 }
+
+val executorTimeoutThreshold = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s"))
--- End diff --

I have thought about add a config constant,but it will affect many other 
codes,so i simply change here. @srowen @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156177369
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (proxyUser != null && principal != null) {
   SparkSubmit.printErrorAndExit("Only one of --proxy-user or 
--principal can be provided.")
 }
+
+val executorTimeoutThreshold = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s"))
--- End diff --

The best place is probably `SparkConf.validateSettings`. I don't see a way 
to not duplicate the defaults, though - they're already duplicated in a bunch 
of places. The right way would be to create a config constant, but that would 
be a much noisier change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156112470
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (proxyUser != null && principal != null) {
   SparkSubmit.printErrorAndExit("Only one of --proxy-user or 
--principal can be provided.")
 }
+
+val executorTimeoutThreshold = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s"))
--- End diff --

OK true. This still seems like the wrong place to validate, as they're not 
arguments to spark-submit, and this requires duplicating default values. 
@vanzin where would you put this check?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156105135
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (proxyUser != null && principal != null) {
   SparkSubmit.printErrorAndExit("Only one of --proxy-user or 
--principal can be provided.")
 }
+
+val executorTimeoutThreshold = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s"))
--- End diff --

Since some one may only set spark.executor.heartbeatInterval without 
setting spark.network.timeout.So i just get with default value.Does this make 
sense?Since we may not need to check if both are not set. Thanks! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19942#discussion_r156069510
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (proxyUser != null && principal != null) {
   SparkSubmit.printErrorAndExit("Only one of --proxy-user or 
--principal can be provided.")
 }
+
+val executorTimeoutThreshold = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.network.timeout", "120s"))
+val executorHeartbeatInterval = Utils.timeStringAsSeconds(
+  sparkProperties.getOrElse("spark.executor.heartbeatInterval", "10s"))
--- End diff --

Rather than repeat a default, just perform the check if both are set.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
GitHub user caneGuy opened a pull request:

https://github.com/apache/spark/pull/19942

[SPARK-22754][DEPLOY] Check whether spark.executor.heartbeatInterval 
bigger…

… than spark.network.timeout or not

## What changes were proposed in this pull request?

If spark.executor.heartbeatInterval bigger than spark.network.timeout,it 
will almost always cause exception below.
`Job aborted due to stage failure: Task 4763 in stage 3.0 failed 4 times, 
most recent failure: Lost task 4763.3 in stage 3.0 (TID 22383, executor id: 
4761, host: c3-hadoop-prc-st2966.bj): ExecutorLostFailure (executor 4761 exited 
caused by one of the running tasks) Reason: Executor heartbeat timed out after 
154022 ms`
Since many users do not get that point.He will set 
spark.executor.heartbeatInterval incorrectly.
This patch check this case when submit applications.

## How was this patch tested?
Test in cluster


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/caneGuy/spark zhoukang/check-heartbeat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19942


commit 891a092ac3a95f32cb2f9e1c215b1c8324c98971
Author: zhoukang 
Date:   2017-12-11T12:48:32Z

[SPARK][DEPLOY] Check whether spark.executor.heartbeatInterval bigger than 
spark.network.timeout or not




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org